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(57) ABSTRACT 

In a graphics pipeline, a rasterizer circuit generates frag- 
ments for an image having multiple surfaces that have been 
tessellated into primitive objects, such as triangles. First and 
second fragments are associated with the same pixel. A 
merge buffer merges the first fragment with the second 
fragment when the two fragments belong to the same 
tessellated surface, the first fragment's primitive is adjacent 
to the second fragment's primitive, both fragments face 
either toward or away from the viewer, and the first and 
second fragment are sufficiently similar that merging is 
unlikely to introduce visually objectionable artifacts. A 
frame buffer receives fragments from the merge buffer, 
stores the fragments, combines the fragments into pixels, 
and outputs the pixels to a display. 
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SYSTEM AND METHOD FOR PRODUCING Typically, hardware-based rasterizers use filter footprints 

AN ANTLALIASED IMAGE USING A MERGE that are lxl pixel squares and thus do not overlap. Such a 

BUFFER filter was used to create pixel grid 40. Each square in pixel 

grid 40 is the filter footprint of a lxl pixel square filter 

This application claims priority on U.S. provisional 5 placed over the discrete pixel point at the center of the 

patent application 60/226,500, filed Aug. 18, 2000. square. This pixel grid 40 is used to generate fragments. 

The present invention relates generally to computer The fragments of an object are obtained by projecting the 

graphics, and more particularly to a system and method for object onto the pixel grid. A fragment is then generated for 

reducing memory and processing bandwidth requirements a given pixel if the footprint of the filter located over the 

of a computer graphics system by using a buffer in a 30 p i xe i intersects the object. To illustrate the rasterization 

graphics pipeline to merge selected image fragments before process, rasterization of the three triangles 32 yields a 

they reach a frame buffer. number of fragments for each triangle 32. Within each pixel 

42, the number enclosed by a circle is the number of 

BACKGROUND OF THE INVENTION fragments that are generated for that pixel on behalf of one 

Many computer graphics systems use pixels to define 15 or more primitive objects. For example, since tessellated 

images. The pixels are arranged on a display screen, such as surface 30 does not cover pixel 42-1, no fragments are 

a raster display, as a rectangular array of points. Two- associated with pixel 42-1. Since triangle 32-2 partially 

dimensional (2D) and three-dimensional (3D) scenes are covers pixel 42-2, one fragment 44 is associated with pixel 

drawn on the display by selecting the light intensity and the 42-2. Since all three triangles 32-1, 32-2 and 32-3 partially 

color of each of the display's pixels; such drawing is referred 20 cover pixel 42-3, three fragments 46 are generated for pixel 

to as rendering. 42-3. Because none of the three fragments 46-1, 46-2, 46-3 

Rendering a scene has many steps. One rendering step is cover P L ixel 42 " V^ 42 " 3 , 15 <*M»ycd with a color 

rasterization. A scene is made up of objects. For example, in tha J » a combination of the three fragments 46-1, 46-2, 46-3 

a scene of a kitchen, the objects include a refrigerator, and lhe background color. 

counters, stove, etc. Rasterization is a process by which the The grid 40 depicts the filter footprints obtained by 

following is determined for each object in the scene: (1) locating a filter with a lxl pixel square footprint over each 

identifyng the subset of the display's pixels that are con- P ixel center in the pixel grid. For example, square 48 in grid 

tained within the object, and then for each pixel in this 40 represents the footprint of the filter that is centered over 

subset, (2) identifying the information that is later used to the point in the pixel grid that corresponds to pixel 50. The 

determine the color and intensity to assign to each pixel. color and intensity of a fragment is obtained by sampling the 

Rasterization of an object generates a fragment for each object's color and intensity at each point of intersection with 

pixel the object either fully or partially covers, and the the pixel's filter footprint, weighing each sample by the 

information identified in (2) above is called fragment data. value of the filter at the corresponding point, and accumu- 

A scene may be composed of arbitrarily complex objects. 35 ^ atm § the results. 

Before rendering such a scene by a computer system, a After rasterization, texture mapping is typically applied, 

process called tessellation decomposes the complex objects Texture mapping is a technique for shading surfaces of 

into simpler (primitive), planar objects. Typically, systems objects with texture patterns, thereby increasing the realism 

decompose the complex objects into triangles. For example, of the scene being rendered. Texture mapping is applied to 

polygons with four or more vertices are decomposed into 40 the fragments that correspond to objects for which texture 

two or more triangles. Curved surfaces, such as on a sphere, mapping has been specified by the person who designed the 

are also approximated by a set of triangles. These triangles scene. Texture mapping results in color information that is 

are then are then rasterized. Though with minor modifica- either combined with the existing color information for the 

tions the invention could work with primitives with more fragment or replaces this data. 

sides, for example, quadrilaterals, hereafter we assume that 45 Once the color information is known for a fragment, the 

all surfaces are tessellated into triangles. "Primitives" with frame buffer is updated. In this step, each newly-generated 

more sides will only arise as a consequence of merging fragment is either added to or blended with previously- 

fragments from two or more triangles. generated fragments that correspond to the same pixel. The 

In FIG. 1, a tessellated surface 30 has three primitive frame buffer stores up to N fragments per pixel, where N is 

objects— triangle one 32-1, triangle two 32-2 and triangle 50 greater than or equal to one. When a new fragment f is 

three 32-3. The edges of the tessellated surface 30 are generated for a pixel P, the frame buffer rep laces one of pixel 

depicted with wide lines. To illustrate the rasterization P's existing fragments with the new fragment f, blends 

process, the tessellated surface 30 is superimposed on an fragment f with one of the existing fragments, or stores 

exemplary pixel grid 40. Each pixel 42 of the pixel grid 40 fragment f with the existing fragments if fewer than N 

is represented by a square. The rasterization process gener- 55 fragments are currently stored. In such systems, the dis- 

atcs a fragment for each primitive object that is supcrim- played color of a pixel is obtained by blending together the 

posed on a pixel 42. °ew fragment f with up to N stored fragments. 

In the rasterization process, a finite array of discrete Because rasterization of a scene typically yields many 

points, each point representing the center of a pixel of the fragments for each pixel, the texture-mapping stage and 

display device, is used to construct a regular grid, for 60 f rame buffer often process multiple fragments for the same 

example the pixel grid 40. To construct such a grid, a filter pixel. In many cases, fragments from two or more adjoining 

kernel is placed over each of the discrete points. The triangles that cover the same pixel may have nearly identical 

two-dimensional bounding shape of the portion of the filter color and depth values because the fragments belong to the 

that has non-zero weight is sometimes called the support in same tessellated surface. 

signal processing theory, but is commonly referred to as the 65 Artifacts are distortions in the displayed image. One 

footprint. In the general case, the filler footprints of neigh- source of artifacts is aliasing. Aliasing occurs because the 

boring pixels overlap each other and thus intersect. pixels are sampled and therefore have a discrete nature. 
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Artifacts can appear when an entire pixel is given a light 
intensity or color based upon an insufficient sample of points 
within that pixel. To reduce aliasing effects in images, the 
pixels can be sampled at subpixel locations within the pixel. 
Each of the subpixel sample locations contributes color data 5 
that can be used to generate the composite color of that pixel. 

As shown in FIG. 2, the filter is typically evaluated at a 
predefined number of discrete points 56 within the footprint. 
Typically, from four to thirty-two sample points are used. In 
one approach to sampling, sparse supersampling, these 10 
points are "staggered" on a fine grid. For example, the filter 
for the pixel 50 is sampled at four points 56, labeled SI, S2, 
S3, and S4, chosen from a 4x4 array 60 aligned to the center 
62 of the pixel 50. The term coverage mask refers to the data 
that records, for the sample points 56 associated with pixel 15 
50, whether each sample point is inside or outside of the 
object being rendered. An object is said to fully cover a pixel 
if all of the sample points for the pixel are inside the object; 
otherwise the object is said to partially cover the pixel if at 
least one sample point is inside the object. 20 

Careful examination of a supersampled pixel reveals that 
the color and depth values at different sample points within 
a pixel usually differ little from each other, as long as the 
sample points belong to the same surface. For example, if a 
pixel is completely covered by a surface, then most of the 25 
color and depth values are likely to be fairly similar. This 
similarity usually holds true even when different sample 
points belong to different primitives (triangles) of the same 
tessellated surface. 

- 30 

If a graphics accelerator processes multiple sample points 
for a single fragment en masse, then it is inefficient to 
process multiple fragments for a single pixel, when the 
fragments belong to a single surface that has been tessellated 
into multiple primitive objects. Therefore, to reduce the 35 
memory and processing bandwidth requirements of a graph- 
ics accelerator (or equivalently to reduce the amount of 
processing required to render an object), a method and 
apparatus are needed that merges fragments from adjoining 
primitive objects of a tessellated surface that cover the same 4Q 
pixel. 

SUMMARY OF THE INVENTION 

In a graphics pipeline, a rasterizer circuit generates frag- 
ments for an image having multiple surfaces that have been 45 
tessellated into primitive objects, such as triangles. First and 
second fragments are associated with the same pixel. A 
merge buffer merges the first fragment with the second 
fragment when the two fragments belong to the same 
tessellated surface, the first fragments primitive is adjacent 50 
to the second fragment's primitive, both fragments face 
either toward or away from the viewer, and the first and 
second fragment are sufficiently similar that merging is 
unlikely to introduce visually objectionable artifacts. A 
frame buffer receives fragments from the merge buffer, 55 
stores the fragments, combines the fragments into pixels, 
and outputs the pixels to a display. 

In a particular embodiment, in a graphics pipeline, a 
rasterizer circuit generates fragments for an image having a 
tessellated surface. First and second fragments are associ- 60 
ated with the same pixel and are also associated with the 
tessellated surface. Each fragment has an associated depth 
value and color information. A merge buffer merges the first 
fragment with the second fragment when the following four 
criteria are met: (1) the first and second fragments are 65 
generated sufficiently close in time, (2) the first fragment's 
primitive is adjacent to (shares an edge with) the second 
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fragment's primitive in 3D space, (3) the first and second 
fragments' primitives are oriented similarly in 3D space, and 
(4) the depth value and color of the first and second 
fragments are sufficiently similar. This merged fragment 
may then merge with subsequent fragments if these criteria 
are again met. A frame buffer receives fragments from the 
merge buffer, some of which may have been merged; per- 
forms a depth test; stores the resulting visible fragments; 
combines color, transparency, and depth information from 
all fragments associated with each pixel into a (red, green, 
blue, alpha transparency) quadruplet; and outputs the qua- 
druplets to a display. 

In another aspect of the invention, the merge buffer has a 
fragment storage storing up to a predetermined number of 
fragment tuples. Each stored fragment tuple is associated 
with a fragment. It should be noted that when a fragment is 
in the merge buffer, the graphics accelerator does not yet 
know if the fragment will be visible. Each fragment tuple 
includes a coverage mask, color value, depth (Z) value, and 
a pair of depth gradient (Z gradient) values. The fragment 
tuples are also associated with an x-y position tag. A merge 
pipeline processing circuit processes a new fragment tuple 
representing a fragment to be added to the pixel. The 
pipeline processing circuit includes a sequence of pipeline 
stage circuits, A comparison stage compares an x-y position 
tag of a new fragment tuple with the x-y position tags of the 
fragment tuples in the fragment storage and identifies a 
potentially mergable existing fragment tuple based on a 
result of the comparison. An evaluation stage compares 
coverage masks, primitive edges, surface normal vectors, Z 
values, and color, or a subset thereof, to determine if the new 
fragment tuple should actually be merged with the poten- 
tially mergable fragment tuple. A fragment merging stage 
merges the color value, the Z value and the pair of Z gradient 
values of the new fragment tuple and the potentially mer- 
gable fragment tuple to generate a merged fragment tuple 
based on the outcomes of the evaluation stage. An update 
fragment storage stage stores the merged fragment in the 
fragment storage. 

Merging fragments in the merge buffer increases the 
rendering speed by reducing the number of fragments sent to 
the frame buffer to add or merge with a pixel's existing 
fragments. This in turn also reduces the amount of work 
required by the frame buffer to add or merge a new fragment 
with a pixel's existing fragments, by decreasing the average 
number of fragments stored with each pixel. The present 
invention merges fragments within a pixel from the same 
surface before the fragments reach the frame buffer. Each 
time a first and second fragment are merged, the invention 
avoids both writing the first fragment to the frame buffer, 
and subsequently reading the first fragment from the frame 
buffer. Therefore merging fragments in a merge buffer 
before the fragments reach the frame buffer significantly 
reduces frame buffer memory bandwidth requirements. This 
in turn increases the speed of the rendering process for a 
given amount of memory bandwidth. Alternatively, fewer or 
less expensive memory chips with less bandwidth may be 
used. Because fragments are merged, the amount of memory 
for storing the fragment information, including the subpixel 
information, may also be reduced. In addition, the present 
invention employs heuristics that decrease the likelihood 
that merging will introduce noticeable artifacts. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Additional objects and features of the invention will be 
more readily apparent from the following detailed descrip- 
tion and appended claims when taken in conjunction with 
the drawings, in which: 
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FIG. 1 represents a tessellated surface and the associated 
pixel fragments; 

FIG. 2 represents a subdivision of a pixel of FIG. 1 into 
subpixels. 

FIG. 3 is a block diagram of an exemplary computer 
graphics system that can be used to practice the invention; 

FIG. 4 illustrates data structures stored in a pixel memory 
representing a plurality of fragment tuples; 

FIG. 5 is a block diagram of a graphics system with a 
graphics accelerator using the merge buffer of the present 
invention; 

FIG. 6 is a block diagram of the merge buffer of FIG. 5; 

FIG. 7A is a diagram of a block format of a block of 
fragments of FIG. 6; 

FIG. 7B is a diagram of a block with four fragments with 
their tags; 

FIG. 7C is a diagram of the fragment format of the block 
of FIG. 7A; 

FIG. 7D is a diagram of the primitive edge format of the 
block of FIG. 7 A; 

FIG. 8 is a block diagram of a merge buffer pipeline of 
FIG. 6; 

FIG. 9 is a flowchart of a method for processing fragments 
using the merge buffer pipeline of FIG. 8; 

FIG. 10A is a block diagram of the fragment storage of 
FIG. 6; 

FIG. 10B is a block diagram of an alternate embodiment 
of the fragment storage of FIG. 6; 

FIG. 11 is a block diagram of an evaluation stage of FIG. 

8; 

FIG. 12 is a flowchart of a method for determining 
whether to merge fragments using the evaluation stage of 
FIG. 11; 

FIGS. 13A-13E illustrate a merge of two fragments 5 edge 
signatures. 

FIG. 14 is a block diagram of a coverage mask merge 
circuit of a fragment merging stage of FIG. 8; 

FIG. 15A is a block diagram of a color value merge circuit 
of the fragment merging stage of FIG. 8; 

FIG. 15B is a block diagram of an alternate embodiment 
of a color value merge circuit of the fragment merging stage 
of FIG. 8; 

FIG. 16 A is a block diagram of a gradient merge circuit 
of the fragment merging stage of FIG. 8; 

FIG. 16B is a block diagram of an alternate embodiment 
of the gradient merge circuit of the fragment merging stage 
of FIG. 8; 

FIG. 16C is a block diagram of another alternate embodi- 
ment of the gradient merge circuit of the fragment merging 
stage of FIG. 8; 

FIG. 16D is a block diagram of yet another alternate 
embodiment of the gradient merge circuit of the fragment 
merging stage of FIG. 8; 

FIG. 17 is a block diagram of an update fragment storage 
stage of FIG. 8; 

FIG. 18 is a flowchart of a method of operating an update 
block circuit of FIG. 17; 

FIG. 19 is a circuit diagram of a mask comparison circuit 
of the evaluation stage of FIG. 11; 

FIG. 20 is an exemplary hardware implementation of the 
Z projection lest of the depth comparison circuit of FIG. 11; 
and 
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FIG. 21 is a flowchart of a method of determining color 
similarity in the color comparison circuit of FIG. 11. 

DESCRIPTION OF THE PREFERRED 
5 EMBODIMENTS 

The following embodiments of the present invention will 
be described in the context of a graphics accelerator used in 
conjunction with a computer system to display graphical 
images on a computer screen; however, those skilled in the 
art will recognize that the disclosed systems and methods are 
readily adaptable for broader application. For example, 
without limitation, the present invention could be readily 
applied in the context of a printer. 

15 The present invention enables a computer graphics system 
to render high-quality, antialiased images using a reduced 
amount of memory bandwidth and processing bandwidth. 
The present invention includes a buffer in the graphics 
pipeline that merges fragments belonging to the same tes- 

20 sellated surface prior to sending them to the frame buffer for 
further processing and display. As a result, the memory 
bandwidth and processing requirements of the frame buffer 
are reduced, thus allowing high quality images to be gen- 
erated more economically. 

25 System Overview 

FIG, 3 shows a computer system 100 that can generate 
monochrome or multicolor 2-dimensional (2D) and 
3 -dimensional (3D) graphic images for display according to 

30 the principles of the present invention. The computer system 
100 can be any of a wide variety of data processing systems 
including, for example, a personal computer, a workstation, 
or a mainframe. 

In the computer system 100, a system chipset 104 may 

35 provide an interface among a processing unit 102, a main 
memory 106, a graphics accelerator 108 and devices (not 
shown) on an I/O bus 110. The processing unit 102 is 
coupled to the system chipset 104 by the host bus 112 and 
includes one or more central processing units (CPU's). The 

40 main memory 106 interfaces to the system chipset 104 by 
bus 114. 

The graphics accelerator 108 is coupled to the system 
chipset 104 by a bus 116, by which the graphics accelerator 

45 108 can receive graphics commands to render graphical 
images. A graphics memory 122 and a display device 126 
are coupled to the graphics accelerator 108; the graphics 
memory 122 is coupled by bus 124, and the display device 
126, by bus 127. The display device 126 preferably produces 

50 color images, but the invention can also be practiced with a 
monochrome monitor to display grayscale images or with 
printers that print black and white or color images. 

An image appears on the display by illuminating a par- 
ticular pattern of individual points called pixels. While the 

55 image rendered may be two dimensional (2D) or three 
dimensional (3D), the display device itself generally 
includes a two-dimensional array of pixels. The array size of 
display screens can vary widely. Examples of display screen 
sizes include 1024x768 and 1920x1200 pixels. For the 

60 purposes of practicing the invention, the display device 126 
may be any suitable pixel-based display, such as a CRT 
(cathode ray tube), liquid-crystal display, laser printer, or 
ink-jet printer. 

The graphics memory 122 includes storage elements for 
65 storing an encoded version of the graphical image to be 
displayed. There is a direct correspondence between the 
storage elements and each pixel on the display screen 130. 
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The storage elements are allocated to store data representing depth 316 and two bytes for the Z gradient 318. The 

each pixel, hereafter referred to as pixel data. For example, five-byte color 314 field is used to store four 10-bit color 

five bytes may be used to encode a color representation for parameters: Red, Green, Blue, and Alpha. These parameters 

each pixel. are sometimes called "channels." The value stored in each 

The values stored in the storage elements for a particular 5 RGB (Red, Green, Blue) channel indicates the intensity (or 

pixel controls the color of the particular pixel on the screen brightness) of that color channel. Low values correspond to 

130. The "color" of a pixel includes its brightness or low intensity, dark colors; high values correspond to high 

intensity. There are many different ways of representing intensity, light colors. Various methods for producing the 

color information, including direct color value representa- color combining the RGB values are well known in the art. 

tions and indirect representations in which the stored pixel io The opacity of the fragment is expressed by the value 

data are indices used to access a color lookup table. The stored in the Alpha channel. For example, a 1.0 value (i.e., 

present invention is applicable to systems using any pixel all 10 Alpha-channel bits are 1) indicates that the associated 

representation method. fragment is opaque, a 0.0 value indicates that the fragment 

During operation, the computer system 100 can issue is invisible, i.e., completely transparent, and values between 

graphics commands that request an object to be displayed. 15 0.0 and 1.0 indicate degrees of transparency. 

The graphics accelerator 108 executes the graphics In general, a fragment does not have a single color value, 

commands, converting the object into primitives and then as lighting models in common use allow the color to change, 

into fragments. Alternately, processing unit 102 converts the perhaps non-linearly, across the fragment. But since color 

object into primitives, and the graphics accelerator 108 values usually do not change much across a fragment, we 

converts the primitives into fragments. A primitive is a 20 use the color at a single point in the fragment to represent the 

graphical structure, such as a line, a triangle, a circle, or a color of the entire fragment. This point should be near the 

surface patch of a solid shape, which can be used to build centroid of the fragment. The centroid of a fragment is the 

more complex structures. A fragment is a two-dimensional position of the fragment's center of mass. The center of mass 

polygon created by clipping a primitive, such as a line, can be thought of as the position at which the fragment 

triangle, or circle, to the boundaries of the pixel. A more 25 would perfectly balance on a needle if you cut the fragment 

detailed description of fragments is provided by Loren shape out of a piece of stiff paper. In FIG. 4, the point 306 

Carpenter in "The A-buffer, an Antialiased Hidden Surface i s the centroid of fragment 301, the point 307 is the centroid 

Method", Computer Graphics Vol. 18, No. 3, 1984, pp. of fragment 302, and the point 308 is the centroid of the 

103-107, incorporated by reference herein as background fragment 303. 

information. Th e approximation to the centroid can use a fairly simple 

The graphics accelerator 108 renders the fragments, and computation. For example, the x offset of a fragment's 

loads the pixel data corresponding to the fragments into the centroid from the lower left corner of the pixel might be 

appropriate storage elements of the graphics memory 122. computed by adding the x offsets of all sample points within 

Additionally, pixel data can be transferred into the graphics the fragment, then dividing by the number of sample points 

memory 122 from the main memory 106 via busses 114, m the fragment. The y offset can be similarly computed. 

116, and 124, or from processing unit 102 via busses 112, Though this is a crude approximation in the examples using 

116, and 124. four sample points, it is usually pretty accurate in an 

To display the image, the pixel data are read out of the implementation using 16 sample points. 

graphics memory 122 and rendered as illuminated points of 4Q j n general, a fragment does not have a single Z depth 

color on the screen 130 of the display device 126. va i ue> ^ t he fragment's primitive is usually tilted with 

, c, . i fv * o* respect to the viewer. Unlike color values, representing the 

Pixel Subsample Data Storage t \ - . %u . . „ . . , \ v .f t 

r & entire fragment with a single Z value leads to gross artifacts, 

FIG. 4 shows an exemplary pixel 300 that is part of an as incorrectly computing which primitive is visible (nearer 

image and is subdivided into a 4x4 subpixel array. The pixel 45 to the viewer) at several sample points may lead to large 

300 has four sampling positions SI, S2, S3, and S4. Pixel changes in the color of the pixel. Instead, Z values are 

300 is covered by three image fragments 301, 302, 303 from computed at any point in the fragment using a planar (affine) 

three different primitive objects (often herein called equation of the form: 

"primitives"). Each fragment 301, 302, 303 is associated 

with a fragment value, called a "fragment tuple," 310, 311, 50 Z{x,y)-A(x-x^B<y-y 0 )+c 

312. For example, in FIG. 4, fragment tuple 310 is associ- W e choose the point (x 0 , y 0 ) arbitrarily, for example the 

ated with fragment 301, fragment tuple 311 is associated lower left ^me: of the pixel. Note that this arbitrary point 

with fragment 302 and fragment tuple 312 is associated with may be outside a fragm ent's boundaries. For example, in 

fragment 303. FIG. 4 only the fragment 301 contains the lower left corner 

Each fragment value includes a color value 314, a Z depth 55 of the pixel, 

value 316, and Z gradient values 318. The color value 314 In one embodiment the Z depth field 316 is a three-byte 

represents the color and opacity of the corresponding frag- field that contains the fragment's Z value computed at (xq, 

ment at an approximation to the centroid of the fragment. y 0 ); that is, the Z depth field 316 contains the value for the 

The Z depth value 316 represents a Z coordinate value of the coefficient C in the planar equation. In this embodiment each 

corresponding fragment along a Z axis that is perpendicular 60 Z gradient is a two-byte field that includes a one-byte x 

to the image. The Z coordinate is used to provide 3D depth. component and a one-byte Y component. The one-byte x 

The Z gradient information, comprised of an x component component of the Z gradients 318 supplies an approximate 

and a y component, allow the reconstruction of the Z value for the coefiicient A; the one-byte y component of the 

coordinate value at each of the sample points of the frag- Z gradients 318 supplies an approximate value for the 

ment. 65 coefficient B. These values are represented in a floating- 

In one embodiment, each fragment tuple uses five bytes of point format with a 2-bit mantissa (with an implicit leading 

memory to represent the color 314, three bytes for the Z 1), and a 6-bit exponent. Thus, the Z value at the lower left 
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corner of the pixel, in conjunction with the Z gradients, a graphics memory 122 including a texture memory 364 

allow the computation of an approximate Z value at any for storing image texture data and a frame buffer 

sample point within the fragment. The number of bytes used memory 366 for storing data regarding the next graphi- 

for each field in the stored fragment and the particular data ca J frames to be displayed. 

format of those fields may change from one implementation 5 The graphics accelerator 108 processes the graphical 

to another. commands in a pipeline. The graphical commands and data 

Memory is allocated to each pixel 50 (FIG. 2) for storing subsequently created by graphics accelerator 108 flow 

a predetermined number of fragment values, for storing a th h a rasterizer circuit 374j a texture ma p P ing circuit 

dynamic number of fragment vaktes, or using other tech- 3?6 a buflfer 380 of the m invcntionj a frame 

mques well known in the art. This memory can be either 1Q buffef date 3g2 and a ^ u dfiver 3g4 ^ 

graphics memory 122, as shown in FIG. 3, or main memory 4 . r . . ~- A . . ■ «5 u- i u- * 

|gg ' 3 rasterizer circuit 374 rastenzes primitive graphical objects. 

As shown in FIG. 4, each fragment tuple includes a l n this description, the term "rasterizing" means generating 

coverage mask 322, with each bit of the mask indicating fragments from the input commands (i.e., from the objects 

whether or not the fragment value applies to a corresponding s P ecified b ? those commands). TTie texture mapping circuit 

one of the subpixel samples. Thus a fragment value with a 15 376 a PP Ues a texture ma P 10 the fragments. The merge buffer 

coverage mask value of "1 0 0 0" corresponds to a fragment 380 selectively merges the fragments using the techniques of 

covering only subpixel SI, while a coverage mask value of the present invention, and a frame buffer update circuit 382 

"0 1 1 1" would indicate that the fragment value corresponds updates the frame buffer memory 366 with the fragments to 

to a fragment covering subpixels S2, S3 and S4. be displayed. The fragments flow from the frame buffer 

When rendering images having transparent or partially 20 update circuit 382 to the frame buffer memory 366 for output 

transparent fragments, the fragments for a pixel may have via a display driver 384 to the display device 126 (FIG. 3). 

overlapping coverage masks. For example, one fragment To display an antialiased 3D object on the display device 

might have a coverage mask of "0 1 1 1" while another 126, the object is first tessellated by the host processor to 

fragment might have a coverage mask of "0 0 0 produce a set of primitive objects, such as triangles, that 

1" — indicating that both fragments cover subpixel S4. The 25 cover the surface of the object. In the preferred embodiment, 

nearer fragment must be partially transparent, so that the the primitive objects are triangles. Referring back to FIG. 1, 

farther fragment is visible at subpixel S4. while some pixels are completely covered by a single 

When rendering an image, the graphics accelerator 108 primitive object, others are covered by two or more of the 

determines which fragments are visible at each subpixel primitive objects. The portion of each pixel covered by each 

sample. A fragment covers a subpixel when the center of the 30 distinct primitive object corresponds to a distinct fragment, 

subpixel sample is within an area enclosed by the fragment The finer the level at which a curved surface is tessellated 

or, in certain cases, on an edge of the fragment. For (that is, the smaller the primitive objects), the higher the 

subpixels covered by more than one fragment, this determi- percentage of pixels that will be covered by more than one 

nation is based on which fragment has the lowest Z depth at primitive object from the surface, and thus the more pixels 

the subpixel, as well as the opacity of the fragments covering 35 that will have more than one fragment describing a portion 

the subpixel. The fragments with the lowest Z depth (and of the surface. 

thus are closest to the viewer) are referred to as foreground After the 3D object or surface has been tessellated into 

fragments. Fragments with higher Z depth values, which are primitive objects, the primitive objects are rasterized by the 

further from the viewer, are referred to as background rasterizing circuit 374 (FIG. 5). The rasterizing circuit 374 

fragments. An opaque foreground fragment can occlude a 4 ° determines which of the display's pixels are contained 

background fragment behind that foreground fragment. within a primitive object, and determines the associated 

Accordingly, each fragment must pass a Z depth test at color, intensity and other data at each pixel within the 

one or more of the subpixel samples S1-S4, that is, the Z primitive. The rasterizing circuit 374 generates a fragment 

value 316 of the fragment tuple associated with that frag- for each pixel the primitive object cither fully or partially 

ment must be smaller, i.e., closer from the perspective of the 45 covers. The fragment is represented by a fragment tuple. If 

viewer, than the Z value 316 for every opaque fragment the primitive object belongs to a scene with other primitive 

covering the same subpixel sample. The Z depth test is used objects, multiple fragments may be generated for a particular 

regardless of whether the fragment in question is transparent pixel, with each fragment corresponding to a different primi- 

or opaque. If a fragment passes the Z depth test, then the tive object, As will be described below, the merge buffer 380 

graphics accelerator 108 stores the fragment tuple associated 50 identifies certain pairs of fragments from different primitive 

with the visible fragment in the pixel memory 320. objects that are likely to be from the same tessellated surface 

The displayed color of the pixel 300 depends upon the and merges them prior to delivering them to the frame buffer 

filtering function used to combine the fragment tuples asso- update circuit 382. 

ciated with the subpixel samples S1-S4. One filter function The fragments flow from the rasterizing circuit 374 to the 

simply uses a weighted average of the colors of the fragment 55 texture mapping circuit 376. If texture mapping is enabled, 

tuples associated with the four subpixels samples S1-S4. the texture mapping circuit 376 applies a texture to the 

„ „ « „ fragments and outputs the textured fragments to the merge 

Graphics System With Merge Buffer buffer 380 

FIG. 5 shows an implementation of a graphics system 350 The merge buffer 380 selectively merges fragments that 

of the present invention, which provides internal details en are associated with the same pixel before sending the 

about the graphics accelerator 108 and graphics memory 122 fragments to the frame buffer update circuit 382. To do so, 

of FIG. 3. The graphics system 350 includes: the merge buffer 380 applies heuristics that increase the 

a graphics accelerator 108 for receiving graphical com- probability that any fragments that are merged belong to the 

mands from processing unit 102 (FIG, 3), processing same tessellated surface, and that merging will not introduce 

the graphics commands to create a graphical image, 65 undesirable visual artifacts. As a result of merging, the 

and outpulting the graphical image data in a formal to number of fragments transferred to the frame buffer update 

be displayed; circuit 382 is reduced. Because fewer fragments are 
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transferred, the invention reduces the rate at which the frame 
buffer update circuit 382 processes new fragments. The 
invention further reduces the number of existing fragments 
which must be read from and written to the frame buffer 
memory 366 when processing a new fragment. Therefore, 5 
the bandwidth required of the frame buffer update circuit 
382 and frame buffer memory 366 for a given level of 
performance is reduced, thereby improving performance and 
reducing implementation cost. Since the merge buffer 380 
reduces the number of fragments that are processed for a 10 
given pixel during a given time interval, the number of stalls 
in the graphics accelerator pipeline 108 will also be reduced. 

After processing in the merge buffer, the frame buffer 
update circuit 382 adds or blends each fragment output from 
the merge buffer 380 with previously-received fragments is 
that correspond to the same pixel, and stores the resulting 
fragments in the frame buffer memory 366. When a new 
fragment is generated for a pixel, the frame buffer update 
circuit 382 adds the new fragments to the pixel's existing 
fragments, replaces one of the pixel's existing fragments 20 
with the new fragment, or blends the new fragment with one 
or more of the existing fragments. The frame buffer update 
circuit 382 blends the colors of the new fragment and the 
existing stored fragments to generate a color of a pixel to 
output to the display. Preferably, the fragments are blended 25 
using the techniques described in U.S. patent application 
Ser. No. 09/301,257 for METHOD AND APPARATUS FOR 
COMPOSITING COLORS OF IMAGES USING PIXEL 
FRAGMENTS WITH Z AND Z GRADIENT 
PARAMETERS, and described more comprehensibly by 30 
Norman P. Jouppi and Chun-Fa Chang in "Z 3 : An Economi- 
cal Hardware Technique for High-Quality Antialiasing and 
Transparency" in Proceedings of the 1999 
EUROGRAPHICS/SIGGRAPH Workshop on Graphics 
Hardware, ACM Press, New York, August 1999, pp. 85-91, 35 
both incorporated herein by reference as background infor- 
mation. 

The Merge Buffer 

As discussed above, tessellation and rasterization of a 40 
three dimensional surface can generate multiple fragments 
for at least some of the pixels, and therefore the texture- 
mapping circuit and the frame buffer update circuit can 
process multiple fragments for the same pixel from the same 
surface. When fragments from adjacent primitives cover 45 
portions of the same pixel belong to the same tessellated 
surface, those fragments will often have nearly identical 
color and depth values. Therefore, the memory and process- 
ing bandwidth of the frame buffer update circuit can be 
reduced if such fragments are merged. It is desired that 50 
fragment merging should result in no noticeable loss of 
visual quality. 

Loss of visual quality may occur when two fragments that 
cover adjacent portions of the same pixel are merged, but 
belong to different tessellated surfaces. If the fragments 55 
belong to different objects or surfaces, then the fragments 
may be separated in the Z dimension, perpendicular to the 
screen, by a gap. At some future time, the image rendering 
process may insert another object into the gap. If these first 
two fragments are not merged, the two fragments retain their 60 
different depth values, and the future object can be rendered 
properly in front of one fragment but behind the other 
fragment. But if the two original fragments from different 
objects are merged, the merged fragment will have a single 
depth value and a future object that lies between the two 65 
original fragments will be incorrectly rendered as com- 
pletely behind or completely in front of both of the original 



fragments. The merge buffer uses several heuristics to 
decrease the probability of merging two fragments belong- 
ing to different objects or surfaces. 

Loss of visual quality may also occur when two fragments 
belonging to the same tessellated surface are merged, but the 
fragments are sufficiently different that a single merged 
fragment cannot adequately represent them. For example, if 
the fragments face in substantially different directions, a 
combined fragment cannot represent the sharp edge between 
them, and nearby objects may be erroneously obscured or 
made visible by a single merged fragment. The merge buffer 
uses several heuristics to decrease the probability that merg- 
ing two fragment will result in visually objectionable arti- 
facts. 

As shown in FIG. 6, the merge buffer 380 includes an 
input queue 388, a main merge block 390, and an output 
queue 392. The input queue 388 isolates the rasterizcr circuit 
(374, FIG, 5) and the texture-mapping circuit (376, FIG. 5) 
from the main merge block 390 to allow fragments to 
continue to be generated even if the main merge block 390 
processes certain fragments more slowly than the rasterizer 
circuit outputs them to the merger buffer 380. The output 
queue 392 isolates the main merge block 390 from stalls that 
may occur in the frame buffer update stage 382. 

The main merge block 390 has a merge buffer pipeline 
394 and a fragment storage 396. The input queue 388, merge 
buffer pipeline 394, fragment storage 396 and output queue 
392 receive and output fragments in blocks, where each 
block has a predetermined number R of fragments. The input 
queue 388 receives new blocks of fragments. In the main 
merge block 390, a new block N of fragments is retrieved 
from the input queue 388 and inserted into the merge buffer 
pipeline 394. The merge buffer pipeline 394 selects an 
existing block E of fragments already stored in the fragment 
storage 396 to merge with the new block N of fragments. 
The merge buffer pipeline 394 merges those fragments 
meeting predetermined merge criteria and stores the merged 
fragments in the fragment storage 396. Fragments that do 
not meet the predetermined merge criteria are not merged. 
The new and existing blocks of fragments pass through all 
the stages of the merge buffer pipeline 394. The fragment 
storage 396 ejects blocks of fragments to the output queue 
392. For a given x-y screen position, blocks are ejected in 
the same order that the blocks were received by the input 
queue 388, taking into account fragment merges. 

In FIG. 7 A, in the fragment memory, a block 410 stores 
a predetermined number R of fragments 420 with a tag 414, 
a mergable bit 416, a likely-to-merge bit 418, and primitive 
edges 419. 

The mergable bit 416 indicates whether the fragments of 
a block are available or unavailable for merging. The mer- 
gable bit 416 is usually set to "mergable" by the rasterizer, 
and maintains this status when a block is first written to 
fragment storage 396. For correctness, only the most recent 
block stored in fragment storage 396 for a given x-y screen 
position may be merged with a new block. The mergable bit 
of any block in fragment storage 396 may be set to "not 
mergable" by the merge buffer pipeline 394 to maintain this 
condition. This includes the setting of a block in fragment 
storage 396 to "not mergable" when a new block at the same 
x-y screen position, and marked "not mergable," arrives at 
the merge buffer. 

The mergable bit 416 is set to "mergable" by the rasterizer 
only if the techniques described later were used to create the 
illusion that the triangle is curved, and smoothly connects up 
to adjacent "curved" triangles with a "rounded" edge. This 
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is typically the case when a curved surface is tessellated into 
triangles. The mergable bit 416 is set to "not mergable" by 
the rasterizer to indicate that lighting computations were 
applied to the triangle as a flat surface, which connects to 
other flat surfaces with a sharp edge. This ensures that a 5 
non-curved tessellated surface (for example, a block with six 
faces) maintains its sharp edges. 

The likely-to-merge bit 418 is used to identify those 
blocks of fragments that contain the most recently encoun- 
tered interior or exterior edge of a tessellated surface. 10 
Interior edges do not exist in the desired surface, but are an 
artifact caused by tessellating the surface into primitives 
such as triangles. Each such interior edge is shared between 
two primitives that belong to the surface. For example, 
triangles 32-1 and 32-2 (FIG. 1) share an interior edge. If a 15 
surface is tessellated using triangle strips or fans, for 
example, each triangle in the strip or fan contains an old 
edge shared with the previous triangle (if any), one new edge 
that will be shared with the next triangle (if any), and one 
new edge that does not immediately adjoin either the pre- 20 
vious or next triangle. Blocks that contain the new edge that 
will be shared with the next triangle have their likely-to- 
merge bit 418 set to True, as they are likely to merge with 
blocks that are generated in the near future when the 
adjoining triangle is rasterized. 25 

The likely-to-merge bit 418 is generated in circuitry 
external to the merge buffer. In one implementation, the 
rasterizer circuit generates the likely-to-merge bit. For 
example, the rasterizer can set the likely-to-merge bit to True 
for fragment blocks that are bisected by the newest internal 30 
edge (i.e., of the most recently generated triangle) in a 
triangle strip or fan. A block is bisected by an edge if some 
sample points in the block are on one side of the edge, and 
the rest of the sample points are on the other side of the edge. 
In other words, the rasterizer will preferably set the likely- 35 
to-merge bit to True when generating fragments along the 
most recently encountered internal edge of a tessellated 
surface, and otherwise will set the likely-to-merge bit to 
False. 

40 

The primitive edges 419 initially represent the edges of 
the triangle for which the fragment block was generated. 
After one or more merges, they represent a subset of the 
edges of the polygon that is the union of the merged 
fragments' triangles. For example, in FIG. 1, all blocks 45 
generated on behalf of triangle 32-1 initially contain the 
three vertices of triangle 32-1 in the primitive edges; all 
blocks generated on behalf of triangle 32-2 initially contain 
the three vertices of triangle 32-2, After two blocks in the 
same position along the shared edge between triangles 32-1 5Q 
and 32-2 are merged, the merged block contains at most two 
edges from the quadrilateral formed by joining triangles 
32-1 and 32-2 and removing the shared edge between them. 

The primitive edges 419 are shown in more detail in FIG. 
7D. The primitive edges 419 are comprised of three vertex 55 
hashes 444-1, 444-2, 444-3, and three bisection bits 446-12, 
446-23, and 446-31. To facilitate finding a shared edge 
between two primitives, the vertex hashes arc always stored 
in clockwise order. 

Each vertex hash 444-1, 444-2, and 444-3 is a represen- 60 
tation of the (x, y, z) coordinates of one vertex of the 
primitive (triangle). For greatest certainty that two primi- 
tives are part of the same surface, each vertex hash 444 
contains the full (x, y, z) coordinates of one of the vertices. 
Alternatively, to reduce storage requirements in fragment 65 
storage 396, the invention applies a hash function to each (x, 
y, z), and stores the resulting hash values in vertex hash 444. 
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A hash function "h" takes a coordinate (x, y, z) in 3D 
space, and performs arithmetic or logical operations to 
reduce it to a single value with a smaller number of bits. 
However, this reduced storage comes at a cost: two coordi- 
nates that are different may have the same hash value 
("alias"). That is: 

even when 

(^^;i)"(w2)- 

To minimize problems with such aliases, the hash func- 
tion h might be chosen so that vertices that are near each 
other in 3D space do not hash to the same value. For 
example, the hash function h might concatenate the bottom 
8 bits of the x, y, and z coordinates to create a 24-bit hash 
value. Since the limited size of fragment storage 396 means 
that most blocks will have an x-y tag 414 in a small region 
of 2D space, such a hash function will minimize the chance 
of aliasing two edges whose vertices have the same hash 
value, but different coordinates. Alternatively, a hash func- 
tion that is less efficient to implement, but with strong 
mathematical guarantees about aliasing frequency, can be 
employed. See, for example, the chapter "Some Applica- 
tions of Rabin's Fingerprinting Method" by Andrei Z. 
Broder, in Sequences II: Methods in Communications, 
Security, and Computer Science, edited by R. Capocelli, A. 
De Santis, U. Vaccaro, published by Springer- Verlag, 1993, 
available at ftp://ftp.digital.com/pub/DEC/SRC/ 
publications/broder/fing-appl.ps, and incorporated by refer- 
ence herein. 

Initially, the three edges of the triangle primitive are 
specified by the three vertex hashes 444-1, 444-2, 444-3. 
One edge is between vertex hashes 444-1 and 444-2, one 
between vertex hashes 444-2 and 444-3, and one between 
vertex hashes 444-3 and 444-1, The bisection bit 446-12 is 
associated with the edge between 444-1 and 444-2, bisection 
bit 446-23 is associated with the edge between 444-2 and 
444-3, and bisection bit 446-31 is associated with the edge 
between 444-3 and 444-1. An edge's corresponding bisec- 
tion bit 446 is set to True if the edge bisects the fragment 
block, that is, if some sample points in the block are on one 
side of the edge, and some sample points in the block are on 
the other side. The bisection bit 446 is set to False if all of 
the sample points in the block are on the same side of the 
edge. The bisection bits 446 can easily be computed by a 
fragment generator based upon half-plane equations, such as 
that described by Juan Pineda in "A Parallel Algorithm for 
Polygon Rasterization," SIGGRAPH 88 Conference 
Proceedings, ACM Press, New York, August 1988, pp. 
17-20, incorporated by reference herein as background 
information. 

After merging, as discussed below, the primitive edges 
419 represent two connected sides (i.e., an open jaw) of the 
polygon that results from the union of two or more primi- 
tives. Although the embodiment described above uses three 
vertex hash values and three bisection bits, the scheme is 
extensible to any number of vertices and bisection bits, so 
that more than two connected edges of the polygon may be 
maintained after merging. 

Referring now to FIG. 7B, each fragment 420-426 in the 
block 410 corresponds to a different pixel from a rectangular 
region of the display. The rectangular region has a width of 
W pixels and a height of R/W pixels. The tag 414 uniquely 
identifies the (x, y) location of each block 410, and each of 
the fragments in the block are said to be associated with the 
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tag 414 for the block. In one embodiment, the tag 414 is the the per-fragment normal vectors to the merge buffer. In the 

pixel coordinate of one of the corner fragments of the block event that per-fragment normals are not available, the ras- 

410. For example, for a square block 410 having four terizer might compute an average of the three normal vectors 

fragments 420-30 426, each fragment 420-426 correspond- provided at the vertices, and supply the same average normal 

ing to the pixels with coordinates (x, y), (x, y+1), (x+1, y) 5 vector for each fragment in the triangle. In the least desirable 

and (x+1, y+1), respectively, the tag 414 of the block is (x, case, the rasterizer provides no normal vector information to 

y). The coordinates correspond to the location of the pixel on the merge buffer. In this case, no storage is allocated for 

the display and are commonly referred to as screen coordi- normal vector 442 in the merge buffer, and inferior 

nates. (Note that the tag need not include the least significant approximations, discussed later, may be used in determining 

bits of x and y that are constant for all blocks. In the example 10 when two fragments may be merged or not. Even when no 

block size of 2x2 pixels, the least significant bit of x and the normal information is available from the rasterizer, it can 

least significant bit of y are always 0 for the fragment in the still indicate whether a triangle is lit with flat-shading, or as 

lower left corner.) a curved surface, via the mergable bit as previously dis- 

In FIG. 7C, the exemplary fragment 412 stored in the cussed, 

fragment memory 482 (FIG. 10B) includes a coverage mask is As shown in FIG. 8, there are four major steps that are 

432, color values 434, depth value (Z depth) 436, Z gradient taken when a new block N 452 of fragments enters the merge 

values (Z grad) 438, centroid offsets 440, and normal vector buffer pipeline 394 for processing. Each major step is 

442. implemented as a separate stage of the merge buffer pipeline 

Several of the fragment fields stored in fragment memory 394. The merge buffer pipeline 394 processes a new block 

are identical to information stored in the frame buffer 20 N of fragments and one of the existing blocks E of fragments 

memory 366 and previously described in reference to FIG. from the fragment storage 396. The four stages of the merge 

4. The coverage mask 432 is identical to coverage mask 322 buffer pipeline 394 include: (A) a tag comparison stage 454, 

(FIG. 4), color values 434 are identical to color values 314, (B) an evaluation stage 456, (C) a fragment-merging stage 

Z depth 436 is identical to Z depth 316, and Z gradient 458, and (D) an update fragment storage stage 460. These 

values 438 are identical to Z gradients 318. 25 merge buffer pipeline fragment processing stages 454-460, 

The centroid offsets 440 are the x and y distances from the and the corresponding image data processing steps per- 

lower left hand corner of the pixel to the approximation of formed by those stages are described in more detail below, 

the centroid of the fragment. These need only a few bits of Referring to both FIGS. 8 and 9, a general overview of the 

precision apiece, for example one bit more than that required operation of the merge buffer pipeline will now be provided, 

to represent the subpixel grid on which the sample points lie. 30 In step 462, the tag comparison stage 454 receives a new 

In FIG. 4, with four sample points and thus a 4x4 subpixel block N from the input queue 388 (FIG. 6). In step 464, the 

grid, the x and y centroid offsets 440 might be stored with tag comparison stage 454 compares the tag of the new block 

3 bits apiece. N to the tags of the existing blocks in the fragment storage 

The normal vector 442 (if available from the rasterizer) is 396 to determine whether some or all of the fragments in the 

a triplet (x, y, z), with a length of one (i.e., sqrt(x 2 +y 2 +z 2 )= 35 new block N could be merged with the fragments of one of 

1), which indicates in which direction in 3D space the the existing blocks. More specifically, step 464 determines if 

fragment is facing. The normal vector is perpendicular to the there is a block E in the merge buffer's fragment storage that 

fragment's surface, and in general, is different for each has the same tag as the new block N, and that both N's and 

fragment. When a curved surface is tessellated into triangles, E's mergable bits are set to "mergable." 

the triangles are flat (planar) in space. Thai is, the Z depth 40 Step 466 determines whether the result of the comparison 

can be expressed as an affine function of x and y. We might is a match. If not, in step 468, an entry at the end of the 

therefore assign the same normal vector to each point on the fragment storage 396 will be allocated and the new block N 

triangle. However, applying lighting computations to such a will be stored into the allocated entry. This may be accom- 

surface (flat-shaded lighting) gives it a faceted look. For plishcd by writing the new block N directly into fragment 

example, a sphere tessellated into many triangles that are 45 storage 396, or by passing the new block N unmodified 

then flat-shaded looks like a geodesic dome rather than a through the remaining stages of the merge buffer pipeline 

sphere. This faceted effect persists even when a curved before being stored in the fragment storage 396. 

surface is subdivided into a large number of very small If in step 466 there is a match, there is exactly one existing 

triangles, as the human optic system includes a rather block E that has the same tag 414 as N, and has its mergable 

impressive edge detection system. 50 bit 416 set to "mergable." The fragment storage selects and 

Thus, well known mathematical techniques are applied to outputs for merging this block E, which is the most-recently 

lighting computations to make it appear that different por- inserted block having the same tag as the new block N, In 

tions of a flat triangle face in different directions. In par- this description, the term "inserted" also means "stored," 

licular a different normal vector is supplied for each of the Selecting the mosl-recenlly inserted block ensures that the 

triangle's vertices; these normals are then implicitly or 55 merge buffer does not reorder blocks having the same tag, 

explicitly interpolated across the triangle, so that each point which may lead to undesirable artifacts that violate the 

in the "flat" triangle has a different normal vector. A light semantics of standard 3D application programming inter- 

sourcc is reflected from this "curved" surface at slightly faces. 

different angles from each point on the triangle. This leads In step 470, the evaluation stage 456 compares each 

to much more realistic lighting effects, as the boundary 60 fragment of the new block N with a corresponding fragment 

between different triangles is hidden by smoothly changing in the existing block E to generate exactly one of five 

colors, rather than accentuated by a sharp difference in outcomes for each fragment, based on predetermined simi- 

colors. larity criteria. The five outcomes are: don't-care, replace - 

If the graphics accelerator supports a computationally with-new, replace-with-old, merge and don't-merge. 

expensive lighting model like Phong shading, the normal 65 In step 472, for each respective fragment position in a 

vector is explicitly interpolated by the rasterizer (374, FIG, block, the fragment merging stage 458 generates the frag- 

5) for each fragment. In this case, the rasterizer can provide ment from new block N, the fragment from existing block E, 
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or a merged fragment that combines data from the new block stored in the associative memory with the corresponding 

N and the existing block E, based on the respective outcome portion of the tag of the new block N. If the retrieved portion 

produced by the comparison at step 470. In step 474, the of the tag from the fragment memory 482 matches the 

update fragment storage stage 460 selects a block, either the corresponding portion of the tag from the new block N, the 

new block N or the existing block E, into which each new, s corresponding existing fragment block E is output from the 

existing or merged fragment is to be stored based on the fragment storage 396. 

outcomes from the evaluation stage 456 and other criteria y^e preferred embodiment limits the number of partial tag 

which will be discussed below, candidate matches to at most one. That is, only one block E 

In step 476, if the new block N has at least one valid ^ tne associative memory 492 can have the same partial tag 

fragment left after step 474, the update fragment storage 10 a s the new block N, and have its mergable bit set to 

stage 460 allocates and copies the new block N into an entry "mergable " This way, at most a single entry must be read 

in the fragment storage 396, and sets E's mergable bit to f rom fragment memory 482 during tag comparison stage 

False. In step 478, if block E has been modified, the update 454, an a further verification of the rest of the tag bits is left 

fragment storage stage 460 copies the modified portions of to evaluation stage 456. If evaluation stage 456 determines 

existing block E back into its entry in the fragment storage is ma t the rest of E's tag does not match the rest of N's tag, 

396. then block Npasses through the merge pipeline unmodified, 

The fragment storage 396 and each of the four stages an d E's mergable bit is set to "not mergable." This is 

454-460 of the merge buffer pipeline 394 will next be accomplished by forcing all fragments in N to have a 

discussed in detail. don't-merge outcome. 

p - 20 This embodiment allows a block E, whose partial tag 

s matches block N's partial tag, to be marked "not mergable" 

In FIG. 10A, the fragment storage 396 stores the fragment even when block E's complete tag is not identical to block 

data in a fragment memory 482. The fragment memory 482 N's. This problem can largely be avoided by choosing the 

is implemented as a queue that stores the blocks in one or size of the partial tag based upon the number of merge buffer 

more entries 484. The queue maintains a first-in-first-out 25 entries. If the merge buffer contains 2 q entries, then the 

ordering of the blocks of fragments, but allows a new block bottom q bits of the x and y position of the block (after 

of fragments to be merged with an older block previously removing the x and y bits that are constant across all blocks) 

stored in the queue. The queue has a tail pointer register 486 are candidates for the partial tag. This ensures that any new 

that points to the entry from which the fragment data was block N in a sequence of blocks that are connected in the 

least recently ejected, that is, the next available empty entry. 30 screen's x-y space will pass the full tag comparison test with 

The queue has a head pointer register 488 that points to the an existing block E if their partial tags match, 

entry in which the fragment data was least recently inserted, Referring back to FIG. 10A, eventually each block of 

that is, the next entry to be ejected. fragments in the fragment storage 396 is ejected from the 

To select a potentially mergable block for merging, the tag 35 merge buffer into the output queue 392. When the fragment 

comparison stage 454 has comparison circuitry to compare storage 396 is full, the least-recently-inserted block, which 

the tag of the new block N with the tags of existing blocks is pointed to by the head pointer register 488, is ejected, 

in fragment memory 482. In one embodiment, the fragment When the fragment storage 396 is not full, blocks continue 

memory 482 is an associative memory that compares the to be ejected at a substantially reduced rate. In one 

tags. 40 implementation, a block is ejected from the fragment storage 

As shown in FIG. 10B, in an alternate embodiment, to 396 ever y n c y cles > for example every 16. Alternately, all the 

reduce the size of the associative memory, the fragment blocks in the fragment storage 396 are ejected after a 

storage 396 has a fragment memory 482 and an associative predetermined number of cycles have elapsed without 

memory 492 (Associative XY Memory). The associative receiving a new block. A flush operation is provided for 

memory 492 stores a predetermined fixed portion of the tag 45 synchronization. For example, before copying any data from 

(414, FIG. 7A) for each block, not the entire tag. This the frame buffer; a flush operation is sent down the graphics 

portion of the tag stored in associative memory 492 is pipeline, which ensures that the entire contents of fragment 

hereafter called the "partial tag." The fragment memory 482 storage 396 are ejected before the copy operation proceeds 

stores the remaining information for each block, including down the pipeline. 

the portion of the tag not stored in the associative memory 50 It may be important that the ordering of the blocks be 

492. There is a one-to-one correspondence between the preserved to gain the benefits of prior optimizations of the 

memory locations of the fragment memory 482 and the image data. Examples of prior optimizations include gener- 

associative memory 492, such that each memory location ating blocks in an order that minimizes page crossings in a 

having the same address in the associative memory 492 and frame buffer, or cache misses in a texture cache. If any such 

the fragment memory 482 is associated with the same block. 55 prior optimizations are still relevant, to preserve the 

During operation, the associative memory 492 identifies a ordering, the merge buffer can eject blocks from the frag- 

set of prospective candidate matches P between the new mcnt storage 396 in FIFO (first in first out) order. Each block 

block N and the existing blocks that have tags that are lhat is ejected is the least recently inserted block in the 

sufficiently similar to warrant further investigation. To iden- fragment storage 396 at the time that it is ejected, 

tify the set of blocks of prospective candidate matches P, the 60 To show that the merge buffer preserves the ordering of 

associative memory 492 determines whether the partial tags fragments, the general operation of the merge buffer will be 

of any existing block are the same as the partial tag for the described using the following example. Consider a sequence 

new block N, and if the existing block's mergable bit is set of n blocks that are inserted into the fragment storage 396 in 

to "mergable." For each prospective candidate match, the the following order: 1, 2, ... n. As these blocks are inserted 

fragment storage 396 accesses the fragment memory 482 to 65 into the fragment storage, blocks having the same tag may 

retrieve the portion of the lag not stored in the associative merge, thereby deleting multiple instances of blocks having 

memory, and compares the retrieved portion of the tag not the same tag from the sequence. However, when a new block 
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has a different tag from the tags of the existing blocks, or if 
the new block has the same tag as an existing block but does 
not meet other merge criteria in evaluation stage 456, the 
new block is stored at the end of the fragment storage at the 
entry pointed to by the tail pointer register 486, and therefore 5 
cannot be stored out of order. 

Alternatively, such ordering constraints may be non- 
existent, or unimportant at this point in the fragment pro- 
cessing pipeline. For example, if the merge buffer is after the 
texture mapping unit, it is not necessary to maintain block 10 
ordering intended to minimize cache misses in the texture 
cache. In such cases, the fragment storage 396 can be treated 
more like a cache, with the only ordering requirement being 
that blocks with the same tag must be ejected in the same 
order they entered the merge buffer. This relaxed ordering 15 
requirement allows the merge buffer to eject a block which 
is unlikely to merge, while keeping blocks which are still 
likely to merge, even when the block that is unlikely to 
merge is newer than other blocks that are likely to merge. 

20 

Evaluation Stage 

When a match is found between the tags of a new block 
N of fragments with its mergable bit set to "mergable," and 
an existing block E with its mergable bit set to "mergable," 
the evaluation stage 456 compares the fragments within the 25 
new block N and the existing block E to determine whether 
any fragments can be merged. That is, each fragment n in the 
new block N is compared to the corresponding fragment e, 
in the existing block E. The objective of these comparisons 
is to determine whether each (n, e) pair of fragments is 30 
sufficiently similar to merge without adversely affecting 
visual quality. For each fragment, the evaluation stage 
generates exactly one of five outcomes: don't-care, replace- 
with-new, replace -with-o Id, merge and don't-merge. 

Before further describing the structure and operation of 
the evaluation stage, some terminology will be reviewed and 
defined. The coverage mask 432 is the data that records, for 
the subpixel sample points associated with a pixel, whether 
each sample point is inside or outside the primitive being 4Q 
rendered. A fragment for which all subpixel sample points 
lie within the primitive is a fully -covered fragment. A 
fragment for which at least one, but not all, subpixel sample 
points are within the primitive is a partially covered frag- 
ment. Two fragments overlap or intersect if the intersection 45 
of their coverage masks is a non-empty set. If the intersec- 
tion of the coverage masks of the two fragments is the empty 
set, the fragments do not overlap. 

The block coverage mask is formed by concatenating all 
the fragment coverage masks in the block. If the intersection 50 
of the new block N's coverage mask and the existing block 
E's coverage mask is the empty set, then the two primitives 
for which the fragments were generated probably do not 
overlap, and are therefore potentially mergable according to 
the present invention. 5S 

When a fragment in position i of a block corresponds to 
a pixel that is not covered by the primitive, the coverage 
mask for that fragment is the null set and such fragments are 
referred to as invalid. In this description, the term n„ refers 
to the I th fragment from the new block N and the term e f> go 
refers to the I th fragment from the new block E. Some 
fragment in a block may be invalid because, while a block 
contains R fragments, representing data for a set of R 
contiguous pixels, the image being rendered may cover only 
a portion of those R pixels. 

In FIG. 11, to generate the outcomes for a fragment, the 
evaluation stage 456 has a tag comparison circuit 498, a 



297 B2 

20 

valid/invalid determination circuit 490, a merge determina- 
tion circuit 491, and a merge outcome circuit 497. The tag 
comparison circuit 498 compares the remainder of the 
blocks' tags. If the tags do not match, then blocks N and E 
are not at the same pixel address, and the merge outcome 
circuit 497 generates a don't-merge outcome for each frag- 
ment position in the block. 

Otherwise, the valid/invalid determination circuit 490 
computes if n ( - is valid and if e, is valid. The merge outcome 
circuit 497 generates the don't-care outcome when frag- 
ments n, and e,- are both invalid. The replace-with-old 
outcome is generated whenever fragment n t is invalid and 
fragment e,- is valid, in which case the output fragment will 
subsequently be e,-. The replace-with-new outcome is gen- 
erated when fragment e- is invalid and fragment n ( - is valid, 
in which case the output fragment will subsequently be n,-. 

If the tags match and fragments n, and e, are both valid, 
then merge outcome circuit 497 uses results from the merge 
determination circuit 491 to determine whether to generate 
the merge or don't-merge outcome. The merge outcome 
circuit 497 generates the merge outcome when the two 
fragments' primitives have a common edge that bisects the 
block, the two fragments' blocks do not overlap (i.e. their 
block coverage masks do not intersect), the two fragments 
have roughly the same orientation in 3D space, and their 
color and depth values are sufficiently similar to allow 
merging without substantially affecting visual quality; oth- 
erwise it generates the don't-merge outcome. An edge 
comparison circuit 492 determines if the fragments' primi- 
tives share an edge that bisects the block. A mask compari- 
son circuit 493 determines whether the coverage masks of 
the fragments' blocks do not overlap. A depth comparison 
circuit 494 determines whether the depth of the fragments is 
sufficiently similar to allow merging. An orientation com- 
parison circuit 495 determines whether the fragments face in 
roughly the same direction in 3D space. A color comparison 
circuit 496 determines whether the colors of the fragments 
are sufficiently similar. 

In FIG. 12, a flowchart of the outcome generation circuit 
497 is shown. In step 499, if the remainder of the tags stored 
in fragment memory 482 do not match, the outcome gen- 
eration circuit 497 generates a don't-merge outcome (500). 
Otherwise, in step 501, if fragment n,- is valid, it proceeds to 
step 509, otherwise to step 503. In step 503, if fragment e ( - 
is valid, the outcome generation circuit 497 generates a 
replace-with-old outcome (507), otherwise both n,- and e, are 
invalid and it generates a don't-care outcome (505). In step 
509, if fragment e, is valid, it proceeds to step 512 to 
determine if merging criteria are met, otherwise the outcome 
generation circuit 497 generates a replace-with-new out- 
come (511). In step 512, if the edge comparison circuit 492 
determines that the fragments' primitives do not share a 
common edge that bisects the block, the outcome generation 
circuit 497 generates a don't-merge outcome (514). 
Otherwise, in step 516, if the mask comparison circuit 493 
(FIG. 11) determines that the fragments' blocks overlap, the 
outcome generation circuit 497 generates a don't-merge 
outcome (518). Otherwise, in step 520, if the depth com- 
parison circuit 494 (FIG. 11) determines that the depth of the 
fragments is not sufficiently similar, the outcome generation 
circuit 497 (FIG. 11) generates a don't-merge outcome 
(522). Otherwise, in step 523, if the orientation comparison 
circuit 495 determines that the fragments face in substan- 
tially different directions in 3D space, the outcome genera- 
lion circuit 497 generates a don't-merge outcome (525). 
Otherwise, in step 524, if the color comparison circuit 496 
(FIG. 11) determines that the colors of the fragments are not 
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sufficiently similar, the outcome generation circuit 497 (FIG. 
11) generates a don't-merge outcome (526). Otherwise, in 
step 528, the outcome generation circuit 497 (FIG. 12) 
generates a merge outcome. 

The evaluation stage 456 will be discussed in further 5 
detail below including the criteria used by each of the 
determination circuits 492-496 in merge determination cir- 
cuit 491. The fragment merging stage and the update frag- 
ment storage stage will be described prior to describing the 
evaluation stage in further detail. 10 

The Fragment Merging Stage 

After the evaluation stage 456 (FIG. 11) generates the 
outcomes, the new block N and the existing block E proceed 
to the fragment merging stage 458 (FIG. 8). If the outcome 1S 
for a fragment i is replace- with-new or don't-merge, the 
fragment merging stage 458 outputs the new fragment n,. If 
the outcome for a fragment i is replace -with-old, the frag- 
ment merging stage 458 outputs the old fragment e,-. If the 
outcome for a fragment i is don't-care, the fragment merging 20 
stage 458 outputs an invalid fragment with a coverage mask 
that is all O's. 

Otherwise, the two fragments can be merged, and the 
fragment merging stage 458 (FIG. 8) creates a new merged 25 
fragment, referred to as m,, by combining the new and 
existing fragments' primitive edges, coverage masks, nor- 
mal vectors, depth values, depth gradients, and colors. 

If a pair of fragments merge, then their corresponding 
primitives must have a common edge that bisects the blocks. 30 
That is, two of N's vertex hashes 444 must be identical to 
two of E's vertex hashes 444, and their respective corre- 
sponding bisection bits 446 must be True. There can be at 
most one such matching edge between the two blocks. The 
merge has the effect of eliminating this common edge, for 35 
example by joining two triangles into a quadrilateral. We 
thus have no further need to represent the common edge and 
its respective corresponding bisection bit 446 that is stored 
in both block N and block E. After one merge, this leaves as 
many as four vertices and four bisection bits that might be 40 
relevant to the merged surface. (In general, after n merges 
this leaves as many as n+3 vertices and bisection bits that 
might be relevant.) 

Two triangles specified by the primitive edges 419 in two 
blocks prior to merging are illustrated in FIG. 13A. The 45 
triangle with vertices (1, 0, 48), (0, 5, 47) and (7, 5, 51) is 
rasterized first, the triangle with vertices (1, 0, 48), (7, 5, 51), 
and (9, 2, 50) is rasterized second. After two blocks along the 
shared edge with vertices (1, 0, 48) and (7, 5, 51) are 
merged, the merged block is now part of the quadrilateral 50 
formed from vertices (1, 0, 48), (0, 5, 47), (7, 5, 51), and (9, 
2, 50). This quadrilateral is shown in FIG. 13B, where the 
eliminated shared edge from (1, 0, 48) to (7, 5, 51) is shown 
with a dashed line. 

However, a block's primitive edges 419 has just three 55 
vertex hashes 444 and three bisection bits 446. These can 
represent just two connected edges of the four edges of the 
quadrilateral. We use two criteria, with the first criterion 
taking precedence, to determine which edges to keep in the 
merged block. First, if an edge has a False bisection bit, the 60 
edge cannot be used to satisfy the criteria used by edge 
comparison circuit 492. Thus, any such edges need not be 
stored in the merged block's primitive edges 419, Second, 
the two unshared edges that are part of the newer triangle are 
more important than the two unshared edges that are part of 65 
the older triangle. This exploits the fact that if the two 
triangles are part of a triangle strip or triangle fan, then the 
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next triangle in the strip or fan will occur on one of the edges 
of the newer triangle. 

Thus, up to two edges, with True bisection bits, are chosen 
for the merged block, such that the number of edges with 
True bisection bits from the newer triangle are maximized. 
(The open jaw from vertex hash 444-3 to 444-1 always has 
a False bisection bit 446-31 in the merged block.) 

Some examples are shown in FIGS. 13C, 13D, and 13E. 
In FIG. 13 C, a 4x4 pixel merged block bisects the two 
connected edges with vertices from (7, 5, 51) to (9, 2, 50), 
and thence to (1, 0, 48). (For ease of illustration and 
reference to vertices, here the size and position of the 
fragment block vary from figure to figure; in reality the 
block size is constant, and these different merging situations 
apply to triangles that are of different sizes.) Since there are 
only two connected edges with True bisection bits, the 
second criterion doesn't come into play. 

In FIG, 13D, all four edges of the quadrilateral bisect an 
8x8 pixel block. We again choose the two connected edges 
with vertices from (7, 5, 51) to (9, 2, 50), and thence to (1, 
0, 48), because the unshared edges from the newer triangle 
take priority over edges from the older triangle. 

FIG. 13E shows the oddest case. The 4x4 pixel block is 
bisected by two unconnected edges, one from the newer 
triangle and one from the older triangle. We cannot represent 
both of these edges, so must choose the single edge from the 
newer triangle at (9, 2, 50), and (1,0, 48). Since the two 
edges connected to that chosen edge both have False bisec- 
tion bits, it is irrelevant which (if either) we store. 

Although the preferred embodiment maintains three ver- 
tex hashes 444 and bisection bits 446, it should be clear that 
this scheme is extensible to any number of vertices and 
bisection bits. As the number of vertices increases, the 
decisions about which vertices to keep may become more 
complex, especially if unconnected edges may be chosen. 
More vertices would be desirable for applications that 
tessellate surfaces into triangles that aren't strips or fans, 
which increases the likelihood that triangles generated in the 
future will share an edge with the older triangle rather than 
the newer triangle. 

In FIG. 14, in the fragment merging stage, a subpixel 
mask merge circuit 530 generates a coverage mask of the 
merged fragment m f - by taking the union of the coverage 
masks of fragments n ( - and e,-. 

If the rastcrizcr supplies a normal vector 442 (FIG. 7C) 
for each fragment, then the merged fragment m,- contains a 
renormalized average of e/s and n/s normal vectors. A 
simple average of the normal vector components in general 
creates a vector with non-unit length, and so it must be 
renormalized to unit length. We do not need to compute the 
length of the new vector, but can instead use a table lookup 
to determine the renormalization multiplier. 

Let (x„, y„, z„) be fragment n's normal vector v„, and (x„ 
y e , z e ) be e's normal vector V r . We first compute the sum of 
the two vectors as: 

We desire the normalized merged vector v m to be the 
summed vector divided by its length: 

The length of the summed vector is really: 
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Expanding and regrouping: 

^rt((x rt V^2 rt >(a: e V<r 2 +^ 2 )+C^+2Vc)) 

Since the original vectors were normalized to a length of 
1, we thus have: 

Finally, the last three terms in the above equation for 
determining the length of the merged vector are the dot 
product v M *v tf of the normal vectors, which is computed in 
advance by the evaluation stage 456, as discussed below in 
the Evaluation Stage: Merge Criteria section. As a result, the 
length of the summed vector can be represented as 

sqrt(2(l+v„-v e )) ) 

and more importantly, the value of the dot product v^-v,. of 
the normal vectors is obtained from the evaluation stage 456 
and therefore does not have to be re-computed. 

Further, since the vectors v„ and v e are normalized, the dot 
product is between -1 and 1, inclusive. To renormalize the 
sum of the vectors, we can use a lookup table of, for 
example, 64 or 128 entries. The index to the table is the dot 
product. The output of the table is the function: 

l/sqrt(2(l+input)) 

The sum of the normal vectors is multiplied by the table 
output to create the re normalized, merged vector v m . 

At least two approaches can be used to generate the color 
values of the merged fragment m,-. As shown in FIG. 15 A, 
a subpixel color merge circuit 532, generates color values 
for the merged fragment m ( - by taking a 50/50 blend of each 
color component from n, and e t . An adder 534 adds the 
corresponding components of the new and existing 
fragments, n, and e iT respectively, and a divider 536 
(implemented as a wire shift) divides the resulting sum by 
two. 

As shown in FIG. 15 B, in a second embodiment, the 
subpixel color merge circuit 540 generates color values for 
the merged fragment m,. using a weighted average in which 
each fragment's color components are multiplied by the 
number of samples in its coverage mask, the two weighted 
colors arc summed, and then divided by the number of 
samples in the merged coverage mask. This approach pro- 
vides more accurate results, but requires more computation. 

A multiplier 542 multiplies the color values of the new 
fragment n t by the number of samples in the coverage mask 
for the new fragment n ( . Another multiplier 544 multiplies 
the color values of the existing fragment e, by the number of 
samples in the coverage mask for the existing fragment e,-. 
An adder 546 sums the output of the multipliers 542, 544. A 
divider 548 divides the output of the adder 546 by the 
number of samples in the coverage mask for the merged 
fragment m,-. In one implementation, the divider 548 is 
implemented using a multiplier that multiplies the output of 
the adder 546 by the reciprocal of the number of samples in 
the coverage mask for the merged fragment m, because the 
divisor has a small set of small values and multiplication is 
faster than division. 

The fragment merging stage generates the depth values 
for the merged fragment m ( - using either an average or 
weighted average using the circuit described above with 
respect to FIGS. 15A and 15B, respectively, except that 
depth values are processed instead of color values. 

As shown in FIG. 16A, the fragment merging stage also 
generates Z gradient values 438 for the merged fragment m,-. 
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A gradient merge circuit 550-A has a comparator 552 that 
compares the squares of the lengths of the gradients (i.e., the 
sum of the squares of the gradient components) of the new 
and existing fragments, % and e,.. A multiplexor 554, based 

5 on the result of the comparison, outputs the gradient with the 
shortest length as the gradient for the merged fragment m,-. 
That is, using the depth gradients as an example, let the 
components for the new fragment be (Z x n , Z y "), and those of 
the existing fragment be (Z/, Z y ). In this case, the merged 

10 fragment's depth gradient will be that of the new fragment 
if the following relationship is true: 

z x n *z?+z?*z;<z**z<+z;*z; 

where represents the multiplication operation. 

is Otherwise, the merged fragment's depth gradient will be 
that of the existing fragment's depth gradient. 

In FIG. 16B, alternately, the gradient merge circuit 550-B 
determines the gradient components (e.g., Z x m , Z^) indi- 
vidually of the merged fragment m, by, for each component 

20 of m t -, selecting the corresponding component of n ( - if its 
absolute value is less than that of the corresponding com- 
ponent of e ( -, and otherwise, selecting the corresponding 
component of e t -. That is, using depth gradients as an 
example, Z X OT , the x-component of the merged fragment, will 

25 be set equal to Z x e , the x-component of the depth gradient of 
the existing fragment e ( -, if the absolute value of Z x e is less 
than the absolute value Z/*, otherwise Z/" will be set equal 
to Z x n . An absolute value comparator 556 compares the 
absolute values of each corresponding component from the 

30 existing fragment e, and the new fragment n ( , and a multi- 
plexor 558 outputs one of the components based on the 
determination of the absolute value comparator 556. One 
copy of the circuit shown in FIG. 16B is used for each of the 
two gradient components. 

35 As shown in FIGS. 16C and 16D, other gradient merge 
circuits 550-C, 550-D may be used to generate gradient 
values for the merged fragment m ( - using an average or 
weighted average, respectively, as described above in con- 
junction with FIGS, 15A and 15B. 

40 Update Fragment Storage Stage 

In FIG. 17, after the R fragments in the new block Nhave 
been compared to the R fragments in the existing block E 
and merging is complete, the update fragment storage stage 

45 460 updates the contents of the new block N and the existing 
block E with the fragments that were merged and with those 
fragments that are to replace other fragments. An update 
block circuit 562 updates the fragments in the new block N 
and/or the existing block E. In one embodiment, the block 

5 q into which each fragment is written is determined indepen- 
dently of the outcomes for the other fragments. An update 
fragment storage circuit 564 stores the updated new block N 
and/or the existing block E in the fragment storage. 
Because the comparison of each new and existing frag- 

55 ment pair results in a single fragment to be stored — the new 
fragment n,-, the existing fragment e,-, or the merged fragment 
m — up to R fragments are updated; and up to R other 
fragments may be invalidated. A fragment is invalidated by 
setting its coverage mask equal to zero, that is, the coverage 

60 mask is the null set. 

Note that the block (N or E) into which a given fragment 
is written depends on the comparison outcome and whether 
the fragment is likely to merge again in the future as 
determined by the likcly-to-mcrgc bit associated with block 

65 N. 

Table 1 below summarizes the relationship between the 
comparison outcome, the fragment output from fragment 
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merging stage 458, the likelihood that block N's fragments 
will merge again in the future, and the block into which the 
fragment is written. 

TABLE 1 



The relationship between comparison 
outcomes and where a fragment will he written 

N Likely N Unlikely 

Outcome Fragment to Merge to Merge 

don't-merge new, n leave in block N 

replace- with-new new, n leave in block N move to block E 

replace- with-o Id existing, e move to block N leave in block E 

merge merged, m write into block N write into block E 

don't-care none valid not applicable not applicable 



Note that the likely-to-merge bit of block N identifies if 
N's fragments were generated along the most recent primi- 
tive edge of a tessellated surface, and thus have a good 
change of merging with fragment blocks that will soon enter 
the merge buffer. The likely-to-merge bit of block N, along 
with the outcomes, determines whether a fragment is written 
into the new block N or the existing block E. When either a 
fragment e ( . or a fragment m- is written into block N, the 
corresponding fragment at position i in block E is invali- 
dated. In the case of merge, the fragment in block E is 
invalidated because it has been superceded by the merged 
fragment, and in the case of a "replace-with-old" operation, 
the fragment in block E is invalidated because that fragment 
has been moved into block N. Similarly, if a fragment n,- or 
e, is written to block E, the corresponding fragment in block 
N is invalidated. 

Referring to FIG. 18, a flowchart of the operation of the 
update block circuit 562 (FIG. 17) will be used to explain 
Table 1 in more detail. In step 582, when the outcome of the 
evaluation circuit is don't-care, no fragment is written into 
the new or the existing blocks because both fragments are 
invalid. 

Otherwise, in step 584, when the outcome is merge, step 
586 writes the merged fragment m t - to the new block N if N 
is likely-to-merge, and invalidates the existing fragment e ( - in 
block E. If N is not likely-to-merge, then the merged 
fragment m ( - is written to the existing block E, and the new 
fragment n t - in block N is invalidated. 

When step 588 determines that the outcome is replace- 
with-new, in step 590, if Nis likely-to-merge, the new 
fragment n t - remains in the new block N. If N is not 
likely-to-merge, the new fragment n, is written into the- 
existing block E and the new fragment n, in block N is 
invalidated. 

In step 592, when the outcome is don't-merge, the new 
and existing fragments remain in their respective locations 
in the new and existing blocks. Otherwise, the outcome is 
replace-with-old, and in step 596 if the new block N is 
likely-to-merge, then the existing fragment c ( . is written to 
the new block N, and the corresponding fragment in the 
existing block E is invalidated. If N is not likely-to-merge, 
step 596 leaves the existing fragment e, in the existing block 
E. 

In this way, we move as many fragments as possible (old, 
new, and merged) into a new block N containing fragments 
that are likely to merge. Since the new block N will be 
ejected after the existing block E, this improves the odds that 
these fragments may indeed merge in the future before being 
ejected from fragment storage. On the other hand, if the new 
block N doesn't contain fragments likely to merge, we move 



53,297 B2 

26 

as many fragments as possible into the existing block E. This 
leaves more space for future fragments in the new block N 
and in the best case empties the new block N completely, so 
that it need not be written to, and take up space in, fragment 

5 storage 396. 

It will be appreciated that the relationships shown in Table 
1 correspond to but one embodiment of the present inven- 
tion. One of ordinary skill in the art may select any suitable 
method for determining how to update the merge buffer in 

10 accordance with the principles of the present invention. For 
example, in an alternate embodiment, likely-to-merge infor- 
mation may be unavailable from the fragment generator, and 
so might be assumed to be always false. In this case, 
fragments from the new block N are written into the existing 

15 block E whenever possible (i.e., whenever the fragments are 
merged, and whenever the old fragment in block E is 
replaced by the new fragment in block N). Or likely-to- 
merge might be assumed to be always true, so that fragments 
from the existing block E are written into the new block N 

20 whenever possible. 

Once all the fragments output from fragment merging 
stage 458 have been processed by the update block circuit 
562, the update fragment storage circuit 564 (FIG. 17) 
examines the new block N. If N still has at least one valid 

25 fragment, the entry identified by the fragment storage tail 
pointer is allocated and the new block N is copied into the 
allocated entry. In addition, the existing block E is marked 
as not available for merging with future blocks, by setting its 
mergable bit to False, because only the most recently 

30 inserted block for each tag value is allowed to merge with 
future blocks. 

The update fragment storage circuit 564 then copies any 
modified portions of the existing block E back into its 

35 original entry of fragment storage. The copy-back process 
updates the entry to reflect the fragments that are no longer 
valid, the fragments that have been replaced with a merged 
fragment and the fragments from the new block N. 
To decrease the amount of time to copy blocks into the 

4 q merge buffer and decrease hardware cost, in an alternate 
embodiment, the criteria for writing fragments is modified to 
prevent any new or merged fragments from being written 
into the existing block E when at least one of the fragments 
output from the fragment merging stage 458 (FIG. 8) must 

45 be written to block N by the fragment update storage stage 
460. 

This alternate embodiment can reduce the number of 
write ports into fragment memory 482 (FIG. 10B) from two 
down to one, which in turn greatly reduces the chip real 

50 estate occupied by fragment memory 482. To achieve this 
reduction, the mergable bit, and a valid bit for each fragment 
is allocated to a narrow fragment valid memory with two 
write ports. The valid bits override the coverage mask bits 
432 (FIG. 7C) stored in fragment memory 482. (In essence, 

55 the valid bits are logically ANDcd with the coverage mask 
bits to obtain the true coverage masks.) Whenever a frag- 
ment is copied from an existing block E to the new block N, 
only the fragment valid bits for E must be set to zero. The 
rest of the data for block E, stored in fragment memory 482, 

60 need not be updated. Similarly, whenever a new block N 
must be stored in fragment memory 482, and so E's mer- 
gable bit must be set False, the rest of the data for block E 
in fragment memory 482 need not be updated. 

In one embodiment, fragment memory 482 (FIG. 10B) 

65 provides two read ports. One read port is used to read out a 
block from fragment memory 482 in order to write the block 
to the output queue 392, so that the block is eventually sent 
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to the frame buffer update 382 (FIG. 5). A second read port 
is used to read out an existing block E whose tag matched 
that of a new block N. 

In an alternate implementation, most of fragment memory 
482 is implemented with a single read port (except for the 
valid bits and mergable bit mentioned above, which require 
two read ports). This alternate implementation assumes that 
if an existing block E matches N's tag, then the two blocks 
will usually merge completely into a single block. Thus, we 
read from fragment memory 482, on average, not much 
more than one block for each block processed. If no tag from 
associative memory 492 matches block N's tag, we need not 
read out an existing block E, but must later read N to retire 
it, so a single read is required to process N, On the other 
hand, if a tag matches and we must read out an existing 
block E, this embodiment expects a don't-merge outcome to 
be rare. If no fragment has a don't-merge outcome, then the 
two blocks are coalesced into a single block, leaving either 
N or E empty. We need not subsequently read out most of 
empty block to retire it, as the valid bits, which have two 
read ports, indicate whether a block has any valid fragments. 
Again, a single read suffices to process N. The only case in 
which we need two reads to process a new block N is when 
we read out an existing block E for merging, but then a rare 
don't-merge outcome leaves valid fragments in both N and 
E. When only one read port is available for both functions, 
reading a block for ejection has priority over reading a block 
for possible merging, in order to ensure that an entry in the 
fragment storage can be allocated for a new block if needed. 
If, simultaneous with ejection, a merge read was required, 
the merge read would be stalled. 

In another alternate implementation, to reduce the storage 
cost per fragment, some information is stored on a per-block 
basis such as the Z gradients 438 (FIG, 7C) and/or the 
surface normal vectors 442. 

Evaluation Stage: Merge Criteria 

The determination of whether the merge buffer should 
merge a new fragment n„ and an existing fragment e, is 
based on an estimation of whether the new and existing 
fragments belong to adjacent, non-overlapping primitives of 
the same tessellated surface. Further, to enable a single 
merged fragment to adequately represent the two fragments 
with a minimum of artifacts, we wish to also establish that 
the primitives face in approximately the same direction 
(don't bend too sharply), that neither of the primitives is 
being viewed nearly edge-on, and that the primitives are lit 
or textured with similar colors. In a preferred embodiment, 
described below, this determination is made by comparing 
the information associated with the fragments including the 
primitive edges, coverage masks, normal vectors (if 
available), depth values (optional), depth gradients, and 
color. 

Here is a summary of the merge criteria: 

1. Primitive edge comparison. This test attempts to deter- 
mine if the two primitives are physically adjacent and 
connected in 3D space, by looking for a shared edge 
between the primitives. It is unlikely that two primi- 
tives that are not adjacent and connected in 3D space 
will have two 3D vertices in common. However, this 
possibility becomes more likely when x, y, and z 
coordinates are mapped to discrete values of limited 
precision, and the primitive edge comparison test can 
be fooled by such an occurrence. If the vertices are 
hashed into fewer bits, the test may also be fooled by 
two different edges whose vertices hash into the same 
two vertex hash values. 
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2. Coverage mask overlap. This test determines if there is 
any overlap between the two primitives' 3D projection 
into 2D space, by comparing the two fragment cover- 
age masks. Since the coverage mask overlap test exam- 

5 ines only the projection in 2D space, does not check for 
adjacency, and uses the discrete coverage mask 
samples rather than a continuous representation of the 
primitive edges, it is less stringent than the primitive 
edge comparison. For example, two primitives that are 

10 merely near each other in 2D screen space but are not 
adjacent in 2D space, let alone adjacent or even near in 
3D space, can pass this test. However, it provides an 
inexpensive secondary test to reject two different edges 
that hash into the same vertex hash values, and thus 

15 fool the primitive edge comparison. In the event that 
normal vectors are not available, it further tests that two 
primitives of the same tessellated surface show the 
same side to the viewer (both front face or both back 
face). 

20 3. Orientation tests. Even if two primitive objects appear 
to be part of the same surface, they should not be 
merged if any of the following conditions are true: 

(a) the primitive objects face in directions that are too 
different, because a single Z value and Z gradient 

25 vector can't adequately represent the two primitives 

(a merged fragment would bevel a sharp edge, pos- 
sibly to the point of allowing another, obscured 
surface to "pop through" the bevel); or 

(b) one of the surfaces is nearly edge -on, because then 
30 its Z gradient will be relatively large, and thus may 

fool the Z projection test below; or 
(c) the two primitives show different faces (front and 

back, or back and front) to the viewer. 
If the rasterizer provides per-primitive or per-fragment 
35 normal vectors, these orientation tests are quite accurate. 
If the rasterizer does not provide normal vectors, orien- 
tation tests (a) and (b) may be ignored, with only a small 
increase in artifacts. Under typical conditions, the knowl- 
edge that the triangles have been shaded as "curved" sur- 
40 faces is sufficient to establish that they face in substantially 
the same direction near the shared edge. 

Alternatively (and optionally), orientation tests (a) and (b) 
can be approximated by using the Z gradient information as 
described below in the alternative embodiment. These 
45 approximations can be quite inaccurate. The Z gradient tests 
will not pass fragments that shouldn't be merged, but have 
the opposite problem of rejecting many surfaces that may in 
fact have similarly oriented normal vectors. In particular, Z 
gradient tests tend to reject two surfaces that are nearly 
50 face-on to the observer, thus reducing the efficiency of the 
merge buffer. Orientation test (c) cannot be approximated by 
using Z gradient information, but the coverage mask test 
above will reject two primitives that show different faces if 
the primitives do indeed belong to the same tessellated 
55 surface. 

4. Z projection test. This tests compares Z values to 
determine if two primitives are approximately the same 
distance from the viewer. It provides a useful, but not 
cheap, tertiary backup to tests 1 and 2 above. If the 

60 merge buffer is relatively small, it is probably desirable 
to avoid the real estate (i.e., registers and other 
circuitry) required to implement the Z projection test, 
and instead allocate a large number of bits to the vertex 
hash 444. Either the full Z coordinate might be stored 

65 in the vertex hash, or a hash function can be applied that 
reduces the number of bits in Z by a moderate amount. 
If the merge buffer is large, however, it may be desir- 
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able to use a more aggressive hash function on the 
primitive vertices, which will cause more aliasing that 
can make test 1 less accurate. The Z projection test can 
then help weed out primitives that were incorrectly 
determined to share an edge. The Z projection test can 
be "fooled" by a primitive object that is viewed nearly 
edge -on, as such an object has a relatively large Z 
gradient and thus its projection may span such a large 
range of Z values that it encompasses almost anything 
in the scene. 

5. Color tests. Even if two primitives are adjacent parts of 
the same surface, and are similarly oriented, they may 
still have a large color (or alpha transparency) variation 
(especially near reflected highlights, or because of 
texture mapping). If the color, or any component of the 
color, of the two primitives differ by more than a 
threshold value, the fragments should not be merged. 

Primitive Edge Comparison 

The first criterion for merging considers the primitive 
edges 419. Two fragments are merged only if they have a 
common edge that bisects the blocks (that is, both blocks 
have in common two vertex hashes 444, with a True 
corresponding bisection bit 446). As explained above, the 
vertex hashes are stored in clockwise order. The common 
edge may be represented by any of the three pairs (444-1, 
444-2), (444-2, 444-3), or (444-3, 444-1) in the older block 
E, and any of the three pairs (444-2, 444-1), (444-3, 444-2), 
(444-1, 444-3) in the newer block N. Each vertex hash 444 
in block E must be compared to each vertex hash 444 in 
block N, so nine vertex comparisons are required to imple- 
ment the edge comparison finction. This test is performed 
once for the entire block, and then the result is fed into the 
individual fragment outcome circuits. 

Coverage Mask Overlap 

The second criterion for merging considers the coverage 
masks. Two fragments are merged only if the intersection of 
their respective block's coverage masks is the null set, that 
is, all corresponding pairs of fragments in the two blocks do 
not overlap. If the intersection of the block coverage masks 
is not null, then either the two fragments do not belong to 
adjacent primitives on the same tessellated surface, or 
belong to adjacent primitives on the same tessellated sur- 
faces in which one primitive has its front face visible and the 
other has its back face visible. As discussed above, merging 
two such fragments would substantially increase the poten- 
tial for artifacts. As with the primitive edges, the coverage 
mask overlap test is performed once for the entire block, and 
then the result is fed into the individual fragment outcome 
circuits. 

In FIG. 19, an exemplary mask comparison circuit 600 
determines whether the coverage masks of the existing and 
new blocks overlap. The coverage masks of two blocks 
overlap if, for at least one position in the coverage masks, 
both masks have a one bit at that position. Let S denote the 
number of sample points per fragment, R denote the number 
of fragments per block, and £ £ . denote the j' A coverage mask 
bit of the \ th fragment in block F. A set of AND gates 
602-604 determines whether the individual coverage masks 
bits of corresponding fragment pairs (n /v , e„) . . . (n ls , e w ) . . . 
( n ns* Srs) overlap. A NOR gate 606 generates a non- 
overlapping mask signal with a value of one when the block 
coverage masks do not overlap. 

Orientation Tests 

The third set of criteria for merging involves the orien- 
tation of the two fragments in 3D space, and is composed of 
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three parts. We wish to determine that (a) two fragments face 
in substantially the same direction, (b) that neither fragment 
is viewed nearly edge-on, and (c) that both fragments 
present the same face (front or back) to the viewer. We 

5 describe two implementations of these tests. In the first 
implementation, the rasterizer provides normal vectors for 
each fragment, block, or primitive, and thus orientation 
information is directly available. In the second 
implementation, such vectors are not available, and instead 

10 some rough approximations based upon Z gradients are used 
for tests (a) and (b). 

If normal vectors are available from the rasterizer, it can 
send these normal vectors down the pipeline to the merge 
buffer, which can then compare the angular displacement 

15 between the two fragments' normal vectors. The cosine of 
the angle theta between two such normal vectors is easily 
computed using the dot product of the vectors. That is, if (x M , 
y„, z„) is the normal vector v„ for a fragment n,-, and (x c , y e , 
z e ) is the normal vector v e for a fragment e,., we have: 

20 

cosinc(thcta)-v B -v e ^r B r B +y (I y e +v e 

To ensure that two normal vectors are within some 
maximum angle maxTheta, we must test that: 

25 

cosine (theta) >cos ine (maxTheta) 

Since cosine(O) is equal to 1, we test cosine(theta)>cosine 
(maxTheta) to establish that theta<maxTheta. 

A good value for maxTheta depends upon the granularity 

30 of the normal vectors supplied by the rasterizer. If the 
rasterizer provides normal vectors on a per-fragment or 
per-block basis, then the interpolation of the three normals 
provided at the vertices of the triangle should result in 
normal vectors that are identical along the shared edge. 

35 Thus, the normal vectors for fragments or blocks in the two 
primitives that are near the shared edge will be within a few 
degrees of each other, and maxTheta might be, for example, 
5°. If the rasterizer provides a normal vector on a per- 
triangle basis, then the normals will be separated by a larger 

40 angle, and maxTheta might be chosen to be, for example, 
20°. 

To ensure that neither normal vector represents a nearly 
edge-on view, we can compute the angle eye M , between the 
z axis (that is, the viewer), and the normal vector v„, as: 

45 

cosine(eyc fl )=v /f -(0,0 ) l)=r„ 

and eye c similarly: 

cosine(eye < .)-v e -(0 > 04)-2< t 

We then test that these angles are within some maximum 
angle maxEye, for example 85°. Since the vectors may be 
pointing at the observer (for front-facing fragments), or 
away from the observer (for back-facing fragments), we 
55 must use the absolute value of the cosines: 

abs(z„)>cosine(maxEye) AND abs(r e )>cosine(maxEye) 

Finally, we can test that both vectors face the viewer, or 
6Q that both vectors face away from the viewer. This merely 
requires testing that: 

sign( 2 >sign(0 

More typically, the graphics accelerator supports a 
65 simpler, less accurate lighting model like Gouraud shading. 
This lighting model implicitly assumes that the surface 
normal changes across the primitive object, but per- 



07/15/2004, EAST Version: 1.4.1 



US 6,633,297 B2 
31 32 

fragment normals are not explicitly computed. In this case, In other words, the two fragments, fragment one and 
surface normal vectors are unavailable to the merge buffer. fragment two, tilt approximately the same amount when: 



max(|U|,4)||, \\&z 2 y )\\) 

miKiu^^jiui^.zpiij : 



In a low cost implementation, the mergable bit 416 can be 

used as an indication if the two primitives face in substan- 5 
tially the same direction, thus avoiding orientation test (a) 
entirely. If the mergable bit 416 is "not mergable" because 

the triangle is flat-shaded, then merging is suppressed for all To simplify the implementation, we need not compute the 

the fragments in the block. On the other hand, if the triangle ^gths of the vectors, which involves a square root, but can 

is shaded as a curved surface, and the mergable bit 416 is ™ instead sc 5 uare both sides of the e q uation * 

"mergable/' we can assume that the two primitives, in the nwx (z 1 /^<*z^/^<^ y )<=« 2 -min(r 1 /r 1 x+ r 1 /^ 

vicinity of the shared edge, face in substantially the same zVzVz 2 /^) 
direction. After all, if the triangles are treated as curved 

surfaces, then they join smoothly along the shared edge. This where "*" represents the multiplication operation, 

assumption is not infallible: it can be violated if two small Because we ^ know the scale a PP lied t0 Z values > we 

triangles arc joined at a sharp angle along the shared edge. cannot compute the exact angle that a fragment is tilted away 

±j« . i /i .. , A . / r 7. , f from the viewer. Instead, this test computes the ratio of the 

Although the (implicit) interpolation of the normal vector ^ q{ ^ sufface &g frQm ^ z axi$ 

produces identical vectors along the shared edge, the small (wfaich ig perpendicular to the screen > s x . y plane) . If one of 

triangle size and the sharp angle conspire to alter the normal 20 the surfaces is nearly parallel with the screen (that is, viewed 

vectors at a high rate of change for points not exactly on the f ace on ) j me minimum gradient length will be quite small, 

edge. Thus, points near the shared edge might have substan- yielding a very high ratio, which can cause an undesired 

tially different normal vectors. Such cases are unlikely to don't-merge outcome. If one of the surfaces is nearly 

occur, however, as they result in objectionable artifacts that perpendicular to the screen (that is, viewed edge on), the 

are unrelated to merging. maximum gradient will be very large, again yielding a very 

high ratio. In this case, the don't-merge outcome is 

Alternatively, two optional methods can be used individu- desirable, as the previously described Z projection of a 

ally or jointly to probabilistically determine whether the two nearly edge on fragment may span a huge range of Z values, 

fragments face substantially the same direction. The first Between these two extremes, the ratio provides a reasonable 

method determines if the two surface normals are tilted 30 approximation to the angular displacement between the two 

approximately the same amount away from the viewer (that surfaces. Choosing an appropriate value for n is difficult: too 

is, have roughly the same angular displacement from the z smal1 a value wil1 cause manv undesired don't-merge 

axis), and also usually eliminates fragments that are tilted outcomes, reducing the efficiency of the merge buffer. Too 

nearly edge-on to the viewer. The second method determines 35 ^ a ™ lue ™ U , cauac mm y undesir * d ° uU ; omes for 

t r i j . , t( _ nearly edge-on fragments. In a preferred embodiment, n 

if the two surface normals are rotated approximately the ... u u * n j ^ u * c j- 

, „ , , , / might be somewhere between 2 and 4, but one of ordinary 

same direction in the x-y plane. Both methods are based skm in the aft wil] recognize that any suitable value co uld be 

upon information contained within the z gradients. Since the chosen in accordance with the principles of the present 

z gradients are constant across the primitive object, these invention. 

methods must erroneously assume that all fragments in an 40 i n a second method, the two fragments are determined to 

object have the same normal vector. Further, because they face approximately the same direction when the angle 

cannot compute an actual angular displacement between two between the vectors defined by the gradients in the (x, y) 

normal vectors, these probabilistic tests will also cause plane is small. From trigonometry, we know that the cosine 

many undesirable don't-merge outcomes for fragments that 45 of the an S le between the two vectors is the dot product of the 

are nearly facing the observer. Some implementations might vectors divided bv their len 8* hs: 

therefore forgo these tests, and accept the consequent «*mt«ionK(^^^ 

increase in visual artifacts, in order to maintain a high degree ^/^yOK^^'IIK^^U) 
of efficiency. 

50 There is no way to substantially simplify the computation 

The depth gradients are specified for each of the x and y hefe Eithef the actual leQgths must bc computed with squarc 

screen coordinates. In this description, a z gradient of the rootSj or if both sides of the equation are squared we end up 

depth value Z of a fragment Dn the x direction will be with lots of multiplies. We thus also observe from trigo- 

referred to as z/; and a z gradient of the depth value Z of the nometry that we can compute the sine of the angle between 

fragment f in the y direction will be referred to as z/. The 55 two vectors using the cross product: 

notation ||(z/, zj% represents the length of the vector (z/, 

zj); and (x/, yf) refers to an approximation to the coordi- sinfrotationHCz 1 ^^ 

nates of the fragment's centroid. 1 ^ M{z > )W ** ' )ID 

In a first method, the two fragments are determined to tilt ^ And then further, we can eliminate the lengths of the 

approximately the same amount away from the viewer when vectors; 
the ratio of the lengths of the gradients of the two fragments 

is between 1/n and n, for a relatively small constant n. In tan(mtMi^^^ 

particular, two fragments, fragment one and fragment two, y " y y * y 

tilt approximately the same amount when: 65 We first test to ensure that the angle between the vectors 

is smaller than 90° by testing that the tangent's denominator 

maxCI|(r , ^ > )|[ ) ||(r ! Jf r 2 > )ID<-«*minC|[(2 4 Jt , ? y )U{?*fy)\\)- ^ positive. We can also replace the divide with a multiply. 
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A preferred alternate embodiment implements the rotation ciently close to merge. An exemplary pseudo-code imple- 
tes t as: mentation of this determination is as follows: 



dot m z 1 x " z 2 x + z* y * z 2 y 
if (dot <» 0) { 

generate don't- merge outcome 
} else { 

cross o abs(z J 3t * z 2 y - " z 2 x ) 

if (cross < dot * tanfmaximum rotation angle)) { 

proceed to further tests 
} else { 

generate don't- merge outcome 

} 

} 



Although this test accurately determines the angle 
between the normals in the x-y plane, it may still cause an 
undesirable don't-merge outcome. Consider one normal 
vector tilted 2° and rotated 0°. Consider another normal 
vector tilted 2° and rotated 180°. Though the true angle 20 
between these normal vectors is only 4°, the rotation test will 
still reject a merge. 

Z Similarity Test 

The fourth criterion for merging is that the two fragments 25 
have similar depth values, that is, the fragments are located 
in the 3D scene at a similar distance from the viewer. In one 
embodiment, depth similarity is measured by determining 
the difference between the fragments* depth values. When 
the difference exceeds a predetermined maximum, the frag- 30 
ments are not sufficiently similar for merging. Otherwise, 
when the difference does not exceed the predetermined 
maximum, the fragments* depth values are sufficiently simi- 
lar for merging. 

Basing the comparison on the magnitude of depth values 35 
alone can cause problems because the depth values may not 
be uniformly distributed. In other words, the magnitude of 
the depth values is not fixed but relative. For example, if two 
fragments have depth values that differ by one hundred 
units, whether the fragments are close to each other in the 40 
scene depends on how the depth values were assigned to all 
objects in the scene. In some applications, a depth value 
difference of one hundred units may indicate that the objects 
are far apart, while in other applications a difference of ten 
thousand units may indicate that the objects are close 45 
together. 

Therefore, the present invention measures depth similar- 
ity using the rate at which the depth values change across 
each of the fragments. Two exemplary methods are used to 
determine whether the depth values of the fragments are 50 
sufficiently similar. Each method uses the depth gradients to 
extrapolate (project) the Z value at the first fragment's 
centroid toward the second fragment, then tests to see if the 
second fragment's Z value at its centroid is between the first 
fragment's Z value and its projected Z value. 55 

In the simplest method, the projection of one fragment 
towards the other is determined using the product of the sum 
of the gradients and the distance between the fragment 
centroids as follows: 

60 

projection-(jc :i c -x 1 e )*(zV^)+(>' 2 c ->» 1 < :)*(^ +r2 ^ 

To determine whether the depth-similarity requirement is 
met, the value of the projection is added to the first of the 
two fragments' depth values. If the second fragment's depth 
value falls between the first fragment's depth value and the 65 
sum of the first fragment's depth value and the projection, 
then the depths of these two fragments are deemed suffi- 



projection - (x\ - x l ^"(z\ + z 2 J + (y 2 c - yK) m {z l y + z 2 y) 
if (projection < 0){ 

ifXCZ 1 > Z 2 ) AND (Z 1 + projection < Z 2 )) { 

// (fragment 1 is further away than fragment 2) AND 

// (projecting fragment 1 onto fragment 2 causes fragment 

// 1 to be closer than fragment 2) 

// depth similarity requirement met 

} else if ((Z 1 <- Z 2 ) AND (Z 1 + projection >- Z 2 )) { 

// (fragment 1 is closer than fragment 2) AND 

// (projecting fragment 1 onto fragment 2 causes fragment 1 to 

// be further away than fragment 2) 

// depth similarity requirement met 

} 

} else { 

// depth similarity requirement not met// 

} 



An alternate representation more suitable for hardware 
implementation is as follows: 



projection - (x 2 c - x 1 c )*(z 1 3( + z 2 ^ + (y 2 c - y l ^*(z 1 y + z 2 y ) 
if (signfZ 2 - Z 1 ) - sign(projection) AND 

signfZ 1 + projection - Z 2 ) - sign(projection)) { 
// depth similarity requirement met 

} 



FIG. 20 shows an exemplary hardware implementation of 
a portion of the pseudo-code above. A projection block 652 
determines the value of projection. For the values in the 
horizontal, x, direction, a first subtracter 654 determines the 
difference between x 2 c and x 1 c and a first adder 656 adds z* x 
and t? x . A first multiplier 658 multiplies the output of the 
sub tractor 654 and the adder 656. Similarly, for the values in 
the vertical, y, direction, a second subtracter 662 determines 
the difference between y 2 c and y 1 c and a second adder 664 
adds i} y and z 2 y . A second multiplier 666 multiplies the 
output of the subtractor 662 and the adder 664. A third adder 
668 sums the output of the first and second multipliers, 658, 
666, respectively, to generate a value for projection. 

A third subtractor 670 subtracts Z 1 from Z 2 and a sign bit 
is output to form the term, sign(Z 2 -Z J ) described above. A 
fourth adder 672 adds the value of projection to Z 1 and a 
fourth subtractor 674 subtracts Z 2 from that value. A first 
exclusive-or (XOR) gate 676 generates the exclusive-or of 
the sign bit of the projection value with the sign bit of the 
value output by the subtractor 674. A second XOR gate 678 
generates the exclusive-or of the sign bit of the projection 
value with the sign bit of the value output by the subtractor 
670. An AND gate 680 generates a signal indicating that the 
depths of the fragments are sufficiently similar by perform- 
ing an AND operation on the inverted outputs of the XOR 
gates 676, 678. 

This method of testing depth similarity may generate a 
don't-mcrgc outcome for two fragments that are nearly face 
on to the observer, but which arc rotated substantially around 
the Z axis from each other. This is due to the summing of the 
gradient components. This summing reduces computation, 
but also allows the two fragment's gradients to cancel each 
other out. 

In an alternate embodiment, the depth values of two 
fragments are determined to be sufficiently similar to allow 
merging. This method is similar to the previous method 
except that the gradients are not summed together, and so 
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two projections must be computed: one using the first number of bits only moderately, and in particular would 

fragment's gradients, the other using the second fragment's substantially maintain the Z coordinate information. To test 

gradients. The first and second projections, respectively, are if a merge is possible, it would compare primitive edges, 

formed as follows: coverage masks, and colors; it would not implement any 



projcction 1 =(jc 2 c -x 1 c )*2 1 x +(y 2 c -y l J ,, r 1 > 
projection^* V-O'zVCyV/J*^ 



orientation or Z similarity tests. 

Merging Fragments Before Texturing 



The embodiments described above place the merge buffer 

The depth values of the fragments are determined to be 380 after the texture mapping circuit 376. If merging is 

sufficiently similar for merging if the second fragment's Z 10 instead performed prior to texture mapping, fewer fragments 

value is between the first fragment's Z value and either the will be texture mapped, thus increasing the performance of 

sum of the first fragment's Z value and projection^ or the the texture mapping circuit 376. Merging fragments prior to 

sum of the first fragment's Z value and projection^ An texture mapping substantially increases the amount of data 

exemplary pseudo-code implementation of step 520 (FIG. stored in fragment memory 482, which may require more 

12) is as follows: 15 cmp rca ] es tate than improving texture mapping perfor- 
mance by adding more texturing units in texture mapping 
circuit 376. However, merging prior to texture mapping may 

. 2 ~ si n / ro - cct j on ^ e particularly desirable if the texture mapping circuit 376 

1 ai^s^C^ nn performs several parallel or sequential texture mapping 

// depth similarity requirement met 20 operations (multitexturing) on behalf of a fragment. 

^andI^^ Merging pre-textured fragments requires moving the 

} five merge criteria described above need with a further test 
— — ^— — — — f or closeness of texture coordinates. Unfortunately, the 

a i_ ii_ i . u • iL c * . 25 rasterizing circuit 374 provides texture coordinates (u, v, w, 

Although these exemplary techniques use the fragments \ *i_ . i_ * *u . c j * . i • * * 

4 . u » \ • \ . , q) that have not yet been transformed to take into account 

centroids, in an alternate embodiment, other suitable points, ^ . *\. , • i i 

, ' , r iL . . ' « . . . perspective distortion and mip-mapping level, 

such as the center of the pixel, can be used instead in r r r ° . 

accordance with the principles of the present invention. The most cost effective solution splits the texture- 

30 mapping circuit 376 into two parts, and inserts the merge 

Color Similarity Determination buffer 380 between the parts. The first part performs texture 

mapping coordinate calculations and mip-map selection. 

The fifth criterion for merging is that the two fragments ^ output of this part ^ then provided to the merge bu ff er 

have sufficiently similar color values. A number of methods 380j which can a pp ropr iately test texture map coordinates 

for comparing colors are possible, of which possibly the for closeness before allowing a merge. The merge buffer 380 

simplest is to compute for the red, green, blue, and alpha m turn feeds the second par t, which contains logic that 

(RGBA) components of color the absolute value of the acceS ses the texture data. By merging fragments prior to 

difference between the value for one fragment and the value acceS sing the texture map, the bandwidth requirements to 

for the other fragment. In addition to the other criteria, texture memory can be reduced, or, if a texture cache exists, 

fragments are determined to be sufficiently similar for merg- tne number 0 f ports may be re duced. 
ing if the difference between each component is within a 

predefined range, such as 0.03125 (1/32). Texture Map Merge Criteria 

As shown in FIG. 21, in another embodiment, the color In an alteraate embodiment> an additional fragment merge 

components of the fragments are determined to be suffi- critefia may be based on the texture maps of the ^0 

ciently similar for merging if the sum of the squares of the 45 candidate fragme nts. Each fragment includes a texture map 

differences between each of the color components is smaller coordinate tuple , and a corTe sponding texture map derivative 

than a constant. In step 698, the color similarity requirement { ^ where lhe texture map deriva tive tuple specifies a rate 

is met if diff(Red) 2 +diff(Grcen) 2 +diff(Bluc) +difI(Alpha) 2 of change of each texture map coordinate ^ respect t0 x 

is less than a predefined constant value, such as 0.00390625 and y directionSt merge criteri a include a texture map 

(1/256). 5Q coor dinate similarity requirement wherein each component 

In contrast to determining the similarity of the depth 0 f a second fragment's texture map coordinate tuple must 

values, the similarity of the color components is determined fall between corresponding minimum and maximum values 

using a constant, rather than a gradient, because the value of generated using the first fragment's corresponding texture 

each color component is uniformly distributed. Therefore, map coordinate component, and the corresponding texture 

color gradients need not be stored in the fragment storage of 55 m ap derivative tuple components of at least one of the first 

the merge buffer. and second fragments. The computation of minimum and 

„ , . , n . » , maximum texture map component values is similar to the Z 

Relaxing the Requirements for Mergmg projection computation. The texture map coordinate simi- 

Although the embodiments discussed above presented larity requirement is preferably applied to both x and y 

five criteria to determine whether two fragments are suffi- 60 components of the texture maps of the fragments, but in 

ciently similar to be merged, in some embodiments fewer other embodiments may be applied against just one of the 

criteria could be used, with a consequent increase in arti- components of the texture map coordinate tuples, 
facts. However, a large amount of circuitry might be elimi- 

nated for a small decrease in image quality. The most Pipeline Coherency 

cost-effective implementation would avoid storing normal 65 If two fragments with the same tag arrive closely in time 

vectors, which are not available on many graphics accelera- at the merge buffer pipeline 394 (FIG. 6), the pipeline cannot 

tors. It would use a hash function on vertices that reduced the allow the fragment data to become incoherent. For example, 
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if a first and second fragment are currently merging in the 
merge buffer pipeline 394, then a third fragment with the 
same tag cannot be allowed to merge with the first fragment 
as well. (The first fragment is still visible in fragment storage 
396,) To avoid this problem, the third fragment can be 5 
stalled from entering the pipeline until the first and second 
fragment exit the pipeline and are written back to fragment 
storage 396 (FIG. 6). In this case, the third fragment will 
attempt to merge only with the second fragment (if the first 
and second fragment didn't merge), or with the merged 10 
fragment (if they did merge). Alternately, the third fragment 
can be allowed to enter the pipeline immediately, but pro- 
hibited from merging with either the first and second frag- 
ment. In this case the merged first and second fragments 
must be marked "non-mergable" when they are written to is 
the fragment storage 396. 

Similarly, fragments that are about to be ejected from a 
nearly full fragment storage 396 must not be allowed to enter 
the merge buffer pipeline 394. If the pipeline is nearly full, 
then the oldest fragment blocks must be ejected to make 20 
room for new blocks exiting from the merge buffer pipeline. 
However, if one of these oldest blocks is also in the merge 
buffer pipeline in order to merge with a recent block, the old 
block cannot be ejected until it has emerged (in an updated 
form) from the pipeline. This results in a deadlock, where 25 
the pipeline cannot write a block to the fragment storage, 
and the fragment storage cannot write a block to the output 
queue 392. A simple solution to this problem is to prohibit 
the oldest few blocks in a nearly full fragment storage from 
matching the tag of a new block entering the pipeline. 30 

Finally, there may be times when an application may wish 
to disable merging. In one implementation, if a mode bit is 
set to disable merging, all "mergable" bits in fragment 
storage 396 are set to "not mergable." ^ 

Other Merge Buffer Organizations 

The invention has been described implementing the frag- 
ment storage as a queue. The performance of the merge 
buffer as measured in the percentage of possible merges 40 
actually effected may be increased by using a cache, with an 
associated increase in implementation complexity and cost. 
One functional difference between a cache implementation 
and a queue is that two blocks F and G of fragments with 
different tags may be ejected from the cache in an order that 45 
is different from the order in which they were generated. In 
contrast, these blocks are ejected from the queue in genera- 
tion order. A second functional difference is that block F may 
be written around the cache should it be unlikely to merge 
in the future and should there be no other fragment in the 50 
cache with the same tag. In so doing, the entries in the cache 
could be reserved for fragments that are more likely to 
merge, and hence, a higher rate of merging may occur. A 
third functional difference is that when two blocks N and E 
arc merged/copied into a single block, if a queue is used, the 55 
entry that used to store block E will now contain only invalid 
fragments, and this entry cannot be reused until the head 
pointer passes it. In contrast, with a cache, the entry could 
be reused sooner, and thus, a higher rate of merging may 
occur. 60 

In both queue-based and cache-based implementations, 
the fragments corresponding to a given pixel are used to 
update that pixel in the order that the fragments were 
generated. Our preferred embodiment using a queue ensures 
that this ordering is maintained by allowing a new fragment 65 
to merge with only the most -recently generated fragment for 
the same pixel. A cache-based implementation can most 
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simply meet this requirement by: (1) allowing only one copy 
of a fragment with a given tag to be in the cache at a time, 
and (2) ejecting the block from the cache before writing a 
new block with the same tag either into the cache or around 
the cache. 

While the present invention has been described with 
reference to a few specific embodiments, the description is 
illustrative of the invention and is not to be construed as 
limiting the invention. Various modifications may occur to 
those skilled in the art without departing from the true spirit 
and scope of the invention as defined by the appended 
claims. 

What is claimed is: 

1. A graphics pipeline comprising: 

a rasterizer circuit that generates fragments for an image, 
the image having multiple surfaces, each surface tes- 
sellated into primitive objects; the image including a 
pixel having associated therewith a first and a second 
fragment; the first fragment being generated by the 
rasterizer circuit and having associated therewith an 
object comprising a respective primitive object of said 
primitive objects; and the second fragment being 
selected from a group consisting of a fragment gener- 
ated by the rasterizer circuit and having associated 
therewith an object comprising a respective primitive 
object of said primitive objects, and a combination of 
a plurality of fragments generated by the rasterizer 
circuit and having associated therewith an object com- 
prising a union of a plurality of respective primitive 
objects of said primitive objects; 

a merge buffer that combines the first fragment with the 
second fragment to create a new merged fragment that 
replaces the first and second fragment when predefined 
merge criteria are met, the predefined merge criteria 
include criteria that probabilistically establish that the 
first fragment's associated object is adjacent to the 
second fragment's associated object, that the first and 
second fragments are from a common tessellated sur- 
face of the multiple surfaces, and that the first and 
second fragments are sufficiently similar to avoid visu- 
ally objectionable artifacts when the first and second 
fragments are merged; and 

a frame buffer that receives fragments from the merge 
buffer, the frame buffer storing fragments and output- 
ting the fragments combined into pixels to a display. 

2. The graphics pipeline of claim 1 wherein the first and 
second fragments each include an ordered set of three- 
dimensional vertex triplets (x, y, z) specifying a subset of 
vertex locations for the fragment's associated object, and 
information specifying whether each edge of a subset of 
edges of the fragment's associated object bisects a rectan- 
gular block associated with the fragment; each edge in the 
subset of edges corresponding to the (x, y) components of a 
pair of the vertex triplets; 

the predefined merge criteria include requirements that 
two vertex locations of the first fragment match two 
vertex locations of the second fragment, that the sub- 
sets of edges of the first and second fragments both 
include an edge corresponding to the (x, y) components 
of the two matched vertex locations, and that the edge 
between the (x, y) components of the two matched 
vertex locations bisects the rectangular blocks associ- 
ated with the first and second fragments. 

3. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a coverage mask indicating 
a set of sample points for the pixel associated with the 
fragment, that are inside the object associated with the 
fragment; 
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the predefined merge criteria include a requirement that 
the set of sample points indicated by the coverage mask 
of the first fragment and the set of sample points 
indicated by the coverage mask of the second fragment 
do not intersect. 

4. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a three-dimensional normal 
vector, indicating a normal direction associated with the 
fragment; the first fragment's normal vector and second 
fragment's normal vector having an angle therebetween; 

the predefined merge criteria include a requirement that 
the angle between the first fragment's normal vector 
and second fragment's normal vector is smaller than a 
predefined maximum angle, 

5. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a z component of a normal 
vector, each normal vector indicating a normal direction 
associated with the fragment; 

the predefined merge criteria include a requirement that 
absolute values of the z component of the first and 
second fragment's normal vectors are both larger than 
a predefined minimum value. 

6. The graphics pipeline of claim 1 wherein the first and 
second fragments each include the sign of a z component of 
a normal vector, each normal vector indicating a normal 
direction associated with the fragment; 

the predefined merge criteria include a requirement that 
the signs of the z components of the first and second 
fragment's normal vectors indicate that both z compo- 
nents are non-negative, or that both are negative. 

7. The graphics pipeline of claim 1 wherein the first and 
second fragments each include shading information; 

the predefined merge criteria include a requirement that 
the shading information of both the first and second 
fragments indicates curved surface shading. 

8. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a depth gradient vector that 
includes a first component, indicating a rate of change in 
depth value in a first direction, and second component, 
indicating a rate of change in depth value in a second 
direction; 

the predefined merge criteria include a requirement that 
value corresponding to a predefined function of the first 
and second components of the Z gradient vectors of 
first and second fragments be larger than a predefined 
minimum value and smaller than a predefined maxi- 
mum value. 

9. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a depth gradient vector; 

the predefined merge criteria include a requirement that 
an angle between the depth gradient vector of the first 
fragment and the depth gradient vector of the second 
fragment be smaller than a predefined maximum angle. 

10. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a depth value and a depth 
gradient vector; 

the predefined merge criteria include a depth similarity 
requirement wherein the depth value of one fragment of 
the first and second fragments must fall within a range 
of depth values generated using the depth value of the 
other fragment of the first and second fragments and the 
depth gradient vector of at least one of the first and 
second fragments. 

11. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a depth value and a depth 
gradient vector; 
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the predefined merge criteria include a depth similarity 
requirement wherein a difference between the depth 
values of the second and first fragments must fall 
within a range of difference values generated using the 
depth gradient vectors of the first and second frag- 
ments. 

12. The graphics pipeline of claim 1 wherein 
the first and second fragments each include a color tuple; 

and 

10 the predefined merge criteria include a requirement that 
the color tuple of the first fragment meet predefined 
color similarity criteria with respect to the color tuple 
of the second fragment. 

13. The graphics pipeline of claim 12, wherein each color 
tuple includes a plurality of elements, and the predefined 
color similarity criteria comprises a requirement that a sum 
of squares of differences between elements of the color tuple 
of the first fragment and elements of the color tuple of the 
second fragment be less than a predefined maximum value. 

2Q 14. The graphics pipeline of claim 12, wherein each color 
tuple includes a plurality of elements, and the predefined 
color similarity criteria comprises a requirement that abso- 
lute values of the differences between elements of the color 
tuple of the first fragment and elements of the color tuple of 
25 the second fragment each be less than a predefined maxi- 
mum value. 
15. The graphics pipeline of claim 1, wherein 
the first and second fragments each include a color tuple; 
and 

30 the predefined merge criteria include a requirement that 
absolute values of the differences between elements of 
the color tuple of the first fragment and elements of the 
color tuple of the second fragment each be less than a 
predefined maximum color element difference value. 
35 16. The graphics pipeline of claim 1 wherein the first and 
second fragments each include a texture map coordinate 
tuple, and corresponding texture map derivative tuples, 
where the texture map derivative tuples specify a rate of 
change of each texture map coordinate with respect to x and 
40 y directions; and 

the predefined merge criteria include a texture map coor- 
dinate similarity requirement wherein a component of 
the texture map coordinate tuple of one fragment of the 
first and second fragments must fall between within a 
45 range of values generated using the corresponding 
component of the texture map tuple of the other frag- 
ment of the first and second fragments and the texture 
map derivative tuple of at least one of the first and 
second fragments. 
50 17. The graphics pipeline of claim 1 wherein the rasterizer 
circuit generates a likely-to-merge bit indicating whether a 
rectangular block associated with a fragment is bisected by 
a most recent internal edge of a sequence of adjacent objects, 
wherein said most recent internal edge would be shared by 
55 a next adjacent primitive object in the sequence of adjacent 
primitive objects, if said sequence includes said next adja- 
cent primitive object. 

18. The graphics pipeline of claim 17 wherein 
the merge buffer contains a memory for storing a set of 
60 fragments to merge with new fragments, each stored 
fragment being marked as one of likely-to-merge and 
not-likely-to-merge; and 
when the merge buffer memory is full, the merge buffer 
preferentially keeps in the merge buffer memory frag- 
65 ments marked as likely-to-merge, and preferentially 
replaces fragments marked as not-likely-to-merge with 
newer fragments. 
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19. The graphics pipeline of claim 17 wherein 

the merge buffer contains a memory for storing a set of 
fragments to merge with new fragments, each stored 
fragment being stored in a block within the merge 
buffer, each block having capacity to store more than 
one fragment and being marked as one of likely-to- 
merge and not-like ly-to -merge; and 

when the merge buffer memory is full, the merge buffer 
preferentially keeps in the merge buffer memory blocks 
marked as likely-to-merge, and preferentially replaces 
blocks marked as not-like ly-to -merge with blocks con- 
taining newer fragments. 

20. The graphics pipeline of claim 19 wherein 

the merge buffer is configured to replace the first fragment 
with the new merged fragment when the block associ- 
ated with the first fragment is marked likely-to-merge 
and to otherwise replace the second fragment with the 
new merged fragment. 

21. The graphics pipeline of claim 1 wherein the merge 
buffer includes a queue for storing a set of fragments to 
merge with new fragments. 

22. The graphics pipeline of claim 1 wherein the merge 
buffer includes a cache for storing a set of fragments to 
merge with new fragments. 

23. The graphics pipeline of claim 1 wherein 

the merge buffer contains a memory for storing a set of 
fragments to potentially merge with new fragments, 
each stored fragment being stored in a block within the 
merge buffer, each block having capacity to store more 
than one fragment and storing a plurality of parameters 
applicable to all fragments stored within the block. 

24. The graphics pipeline of claim 1 wherein the merge 
buffer includes an evaluation stage circuit that performs 
computations on the first and second fragments to determine 
whether the predefined merge criteria are met, and a frag- 
ment merging stage circuit for conditionally merging the 
first and second fragments to generate the new merged 
fragment in accordance with an outcome generated by the 
evaluation stage circuit, wherein the fragment merging stage 
circuit is configured to receive at least one value, other than 
said outcome, computed by the evaluation stage circuit and 
to utilize at least one received value as an input to a 
computation for computing a characteristic of the new 
merged fragment. 

25. The graphics pipeline of claim 1 wherein 

the first and second fragments each include a depth 
gradient vector that includes a first component, indi- 
cating a rate of change in depth value in a first direction, 
and second component, indicating a rate of change in 
depth value in a second direction; and 

the merge buffer includes an evaluation stage circuit that 
performs computations on the first and second frag- 
ments to determine whether the predefined merge cri- 
teria are met, and a fragment merging stage circuit for 
conditionally merging the first and second fragments to 
generate the new merged fragment in accordance with 
an outcome generated by the evaluation stage circuit, 
wherein the fragment merging stage circuit is config- 
ured to generate a depth gradient vector for the new 
merged fragment by selecting whichever of the depth 
gradient vectors of the first and second fragments has a 
smaller length and using the selected depth gradient 
vector as the depth gradient vector of the new merged 
fragment. 

26. The graphics pipeline of claim 1 further comprising a 
texture mapping circuit configured to receive fragments 
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from the rasterizer circuit, apply a texture map to the 
fragments, and outputting the fragments to the merge buffer. 

27. Image processing apparatus comprising: 
fragment storage storing fragment tuples, each stored 

fragment tuple being associated with a fragment in a 
pixel of an image having a plurality of pixels, the image 
having multiple surfaces, each surface tessellated into 
primitive objects; each fragment tuple including a color 
value and a depth value; 
a merge pipeline processing circuit for processing a new 
fragment tuple representing a fragment to be added to 
the image, the pipeline processing circuit including a 
sequence of pipeline stage circuits, including: 
a tag comparison stage circuit for identifying a poten- 
tially mergable fragment tuple, comprising one of 
the fragment tuples in the fragment storage; the new 
fragment tuple having associated therewith a first 
object comprising a respective primitive object of 
said primitive objects, and the potentially mergable 
fragment tuple having associated therewith a second 
object selected from a group consisting of a respec- 
tive primitive object of said primitive objects and a 
union of a plurality of respective primitive objects of 
said primitive objects; 
an evaluation stage circuit for generating an outcome 
based on whether predefined merge criteria are met, 
the predefined merge criteria include criteria that 
probabilistically establish that the object associated 
with the new fragment tuple is adjacent to the object 
associated with the potentially mergable fragment 
tuple, that the new fragment tuple and potentially 
mergable fragment tuple are associated with frag- 
ments from a common tessellated surface of the 
multiple surfaces, and that the first and second frag- 
ments are sufficiently similar to avoid visually objec- 
tionable artifacts when the first and second fragments 
are merged; 

a fragment merging stage circuit for merging the color 
values and depth values of the new fragment tuple 
and the potentially mergable fragment tuple to gen- 
erate a merged fragment tuple based on the outcome 
of the evaluation stage; and 

an update fragment storage stage circuit for storing the 
merged fragment tuple in the fragment storage if the 
predefined merge criteria are met, and for storing the 
new fragment tuple in the fragment storage if the 
predefined merge criteria are not met. 

28. Jht image processing apparatus of claim 27 wherein 
each of the fragment tuples in the fragment storage has 

associated therewith an x-y position tag; and 
the tag comparison stage circuit is configured to identify 
the potentially mergable fragment tuple by comparing 
an x-y position tag of the new fragment tuple with the 
x-y position tags of the fragment tuples in the fragment 
storage. 

29. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a depth gradient vector; 
and 

the evaluation stage circuit generates the outcome based 
on the color values, the depth values and the depth 
gradient vectors of the new fragment tuple and the 
potentially mergable fragment tuple. 

30. The image processing apparatus of claim 27, wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include an ordered set of three- 
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dimensional vertex triplets (x, y, z) specifying a subset 
of vertex locations for the fragment tuple's associated 
object, and information specifying whether each edge 
of a subset of edges of the fragment's associated object 
bisects a rectangular block associated with the fragment 5 
tuple; each edge in the subset of edges corresponding to 
the (x, y) components of a pair of the vertex triplets; 
and 

the predefined merge criteria include requirements that 
two vertex locations of the new fragment tuple match 10 
two vertex locations of the potentially mergable frag- 
ment tuple, that the subsets of edges of the first and 
second fragments both include an edge corresponding 
to the (x, y) components of the two matched vertex 
locations, and that the edge between the (x, y) compo- 15 
nents of the two matched vertex locations bisects the 
rectangular blocks associated with the new fragment 
tuple and the potentially mergable fragment tuple. 

31. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 20 

fragment tuple each include a coverage mask indicating 
a set of sample points for the pixel associated with the 
fragment, that are inside the object associated with the 
fragment; and 

the predefined merge criteria include a requirement that 
the set of sample points indicated by the coverage mask 
of the new fragment tuple and the set of sample points 
indicated by the coverage mask of the potentially 
mergable fragment tuple do not intersect. 3Q 

32. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a three-dimensional nor- 
mal vector, indicating a normal direction associated 
with the fragment; the new fragment tuple's normal 35 
vector and potentially mergable fragment tuple's nor- 
mal vector having an angle therebetween; and 
the predefined merge criteria include a requirement that 
the angle between the new fragment tuple's normal 
vector and the potentially mergable fragment tuple's 40 
normal vector is smaller than a predefined maximum 
angle. 

33. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a z component of a normal 45 
vector, each normal vector indicating a normal direc- 
tion associated with the fragment; and 
the predefined merge criteria include a requirement that 
absolute values of the z component of the new and 
potential mergable fragment tuples' normal vectors are 50 
both larger than a predefined minimum value. 

34. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include the sign of a z component 55 
of a normal vector, each normal vector indicating a 
normal direction associated with the fragment; 
the predefined merge criteria include a requirement that 
the signs of the z components of the new and potential 
mergable fragment tuples' normal vectors indicate that 60 
both z components are non-negative, or that both are 
negative. 

35. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include shading information; and $5 
the predefined merge criteria include a requirement that 
the shading information of both the new fragment tuple 
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and the potentially mergable fragment tuple indicates 
curved surface shading. 

36. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a depth gradient vector that 
includes a first component, indicating a rate of change 
in depth value in a first direction, and second 
component, indicating a rate of change in depth value 
in a second direction; and 
the predefined merge criteria include a requirement that 
value corresponding to a predefined function of the first 
and second components of the Z gradient vectors of 
first and second fragments be larger than a predefined 
minimum value and smaller than a predefined maxi- 
mum value. 

37. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a depth gradient vector; 
and 

the predefined merge criteria include a requirement that 
an angle between the depth gradient vector of the new 
fragment tuple and the depth gradient vector of the 
potentially mergable fragment tuple be smaller than a 
predefined maximum angle. 

38. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a depth gradient vector; 
and 

the predefined merge criteria include a depth similarity 
requirement wherein the depth value of one fragment of 
the new and potentially mergable fragment tuples must 
fall within a range of depth values generated using the 
depth value of the other fragment of the new and 
potentially mergable fragment tuples and the depth 
gradient vector of at least one of the new and poten- 
tially mergable fragment tuples. 

39. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a depth gradient vector; 
and 

the predefined merge criteria include a depth similarity 
requirement wherein a difference between the depth 
values of the potentially mergable and new fragment 
tuples must fall within a range of difference values 
generated using the depth gradient vectors of the new 
and potentially mergable fragment tuples. 

40. The image processing apparatus of claim 27 wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a color tuple; and 
the predefined merge criteria include a requirement that 
the color tuple of the new fragment tuple meet pre- 
defined color similarity criteria with respect to the color 
tuple of the potentially mergable fragment tuple. 

41. The image processing apparatus of claim 40, wherein 
each color tuple includes a plurality of elements, and the 
predefined color similarity criteria comprises a requirement 
that a sum of squares of differences between elements of the 
color tuple of the new fragment tuple and elements of the 
color tuple of the potentially mergable fragment tuple be less 
than a predefined maximum value. 

42. The image processing apparatus of claim 40, wherein 
each color tuple includes a plurality of elements, and the 
predefined color similarity criteria comprises a requirement 
that absolute values of the differences between elements of 
the color tuple of the new fragment tuple and elements of the 
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color tuple of the potentially mergable fragment tuple each 
be less than a predefined maximum value, 

43. The image processing apparatus of claim 27, wherein 
the new fragment tuple and the potentially mergable 

fragment tuple each include a color tuple; and 5 
the predefined merge criteria include a requirement that 
absolute values of the differences between elements of 
the color tuple of the new fragment tuple and elements 
of the color tuple of the potentially mergable fragment 
tuple each be less than a predefined maximum color 10 
element difference value. 

44. The image processing apparatus of claim 27, wherein 
the fragment storage includes a plurality of blocks for 

storing the stored fragment tuples, each block having 
capacity to store more than one fragment tuple and 
storing a plurality of parameters applicable to all frag- 
ments tuples stored within the block. 

45. The image processing apparatus of claim 27, wherein 
the evaluation stage circuit is configured to perform com- 
putations on the new and potentially mergable fragment 
tuples to determine whether the predefined merge criteria are 
met, and the fragment merging stage circuit is configured to 
receive at least one value, other than said outcome, com- 
puted by the evaluation stage circuit and to utilize at least 
one received value as an input to a computation for com- 
puting a characteristic of the merged fragment tuple. 

46. The image processing apparatus of claim 27, wherein 
the new and potentially mergable fragment tuples each 

include a depth gradient vector that includes a first 3Q 
component, indicating a rate of change in depth value 
in a first direction, and second component, indicating a 
rate of change in depth value in a second direction; and 
the fragment merging stage circuit is configured to con- 
ditionally generate a depth gradient vector for the 35 
merged fragment tuple by selecting whichever of the 
depth gradient vectors of the new and potentially 
mergable fragment tuples has a smaller length and 
using the selected depth gradient vector as the depth 
gradient vector of the merged fragment tuple. 40 

47. A method of rendering an image, the method com- 
prising: 

generating fragments for the image, the image having 
multiple surfaces, each surface tessellated into primi- 
tive objects; the image including a pixel having asso- 45 
ciated therewith a first and a second fragment; the first 
fragment being one of the generated fragments and 
having associated therewith an object comprising a 
respective primitive object of said primitive objects, 
and the second fragment being selected from a group 50 
consisting of a generated fragment and having associ- 
ated therewith an object comprising a respective primi- 
tive object of said primitive objects, and a combination 
of a plurality of generated fragments and having asso- 
ciated therewith an object comprising a union of a 55 
plurality of respective primitive objects of said primi- 
tive objects; 

conditionally merging the first fragment with the second 
fragment to create a new merged fragment that replaces 
the first and second fragment when predefined merge 60 
criteria are met, the predefined merge criteria include 
criteria that probabilistically establish that the first 
fragment's associated object is adjacent to the second 
fragment's associated object, that the first and second 
fragments are from a common tessellated surface of the 65 
multiple surfaces, and that the first and second frag- 
ments are sufficiently similar to avoid visually objec- 



tionable artifacts when the first and second fragments 
are merged; and 
storing in a frame buffer fragments from among the 
generated fragments and the new merged fragment, 
combining the fragments into pixels and outputting the 
pixels to a display. 

48. A method of rendering an image, the image having a 
plurality of pixels, the image furthermore having multiple 
surfaces, each surface tessellated into primitive objects; the 
method comprising: 

storing fragment tuples, each stored fragment tuple being 
associated with a fragment in a pixel of the image, each 
fragment tuple including a color value and a depth 
value; 

processing a new fragment tuple representing a fragment 
to be added to a particular pixel of the plurality of 
pixels, the new fragment tuple having a color value and 
a depth value; 
the processing of the new fragment tuple including: 
comparing the new fragment tuple and a selected 
fragment tuple of the stored fragment tuples to 
generate a merge outcome based on whether pre- 
defined merge criteria are met, the new fragment 
having associated therewith a first object comprising 
a respective primitive object of said primitive 
objects, and the selected fragment having associated 
therewith a second object selected from a group 
consisting of a respective primitive object of said 
primitive objects and a union of a plurality of respec- 
tive primitive objects of said primitive objects; 
the predefined merge criteria include criteria that 
probabilistically establish that the first object, asso- 
ciated with the new fragment tuple, is adjacent to the 
second object, associated with the selected fragment 
tuple, that the new fragment tuple and selected 
fragment tuple are associated with fragments from a 
common tessellated surface of the multiple surfaces, 
and that the first and second fragments are suffi- 
ciently similar to avoid visually objectionable arti- 
facts when the first and second fragments are 
merged; and 

merging the new fragment tuple with the selected 
fragment tuple to produce a merged fragment tuple 
when the merge outcome has a predefined value. 

49. The method of claim 48 wherein 

each of the stored fragment tuples has associated there- 
with an x-y position tag; and 

the selected fragment tuple is selected by comparing an 
x-y position tag of the new fragment tuple with the x-y 
position tags of the stored fragment tuples. 

50. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a depth gradient vector; and 

the merge outcome is based on the color values, the depth 
values and the depth gradient vectors of the new 
fragment tuple and the selected fragment tuple. 

51. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include an ordered set of three-dimensional vertex 
triplets (x, y, z) specifying a subset of vertex locations 
for the fragment tuple's associated object, and infor- 
mation specifying whether each edge of a subset of 
edges of the fragment tuple's associated object bisects 
a rectangular block associated with the fragment tuple; 
each edge in the subset of edges corresponding to the 
(x, y) components of a pair of the vertex triplets; 



07/15/2004, EAST Version: 1.4.1 



US 6,633,297 B2 



47 



48 



20 



the predefined merge criteria include requirements that 
two vertex locations of the new fragment tuple match 
two vertex locations of the selected fragment tuple, that 
the subsets of edges of the first and second fragments 
both include an edge corresponding to the (x, y) com- 5 
ponents of the two matched vertex locations, and that 
the edge between the (x, y) components of the two 
matched vertex locations bisects the rectangular blocks 
associated with the new fragment tuple and the selected 
fragment tuple. 10 

52. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a coverage mask indicating a set of sample 
points for the pixel associated with the fragment, that 
are inside the object associated with the fragment; and 15 

the predefined merge criteria include a requirement that 
the set of sample points indicated by the coverage mask 
of the new fragment tuple and the set of sample points 
indicated by the coverage mask of the selected frag- 
ment tuple do not intersect. 

53. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a three-dimensional normal vector, indi- 
cating a normal direction associated with the fragment; 25 
the new fragment tuple's normal vector and selected 
fragment tuple's normal vector having an angle ther- 
ebetween; 

the predefined merge criteria include a requirement that 
the angle between the new fragment tuple's normal 30 
vector and the selected fragment tuple's normal vector 
is smaller than a predefined maximum angle. 

54. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a z component of a normal vector, each 35 
normal vector indicating a normal direction associated 
with the fragment; and 

the predefined merge criteria include a requirement that 
absolute values of the z component of the new and 
selected fragment tuples' normal vectors are both larger 40 
than a predefined minimum value. 

55. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include the sign of a z component of a normal 
vector, each normal vector indicating a normal direc- 45 
tion associated with the fragment; and 

the predefined merge criteria include a requirement that 
the signs of the z components of the new and selected 
fragment tuples' normal vectors indicate that both z 
components are non-negative, or that both are negative. 

56. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include shading information; and 

the predefined merge criteria include a requirement that 55 
the shading information of both the new fragment tuple 
and the selected fragment tuple indicates curved sur- 
face shading. 

57. 'l*he method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple go 
each include a depth gradient vector that includes a first 
component, indicating a rale of change in depth value 
in a first direction, and second component, indicating a 
rate of change in depth value in a second direction; and 

the predefined merge criteria include a requirement that 65 
value corresponding to a predefined function of the first 
and second components of the Z gradient vectors of 
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first and second fragments be larger than a predefined 
minimum value and smaller than a predefined maxi- 
mum value. 

58. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a depth gradient vector; and 

the predefined merge criteria include a requirement that 
an angle between the depth gradient vector of the new 
fragment tuple and the depth gradient vector of the 
selected fragment tuple be smaller than a predefined 
maximum angle, 

59. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a depth gradient vector; and 

the predefined merge criteria include a depth similarity 
requirement wherein the depth value of one fragment of 
the new and selected fragment tuples must fall within 
a range of depth values generated using the depth value 
of the other fragment of the new and selected fragment 
tuples and the depth gradient vector of at least one of 
the new and selected fragment tuples. 

60. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a depth gradient vector; and 

the predefined merge criteria include a depth similarity 
requirement wherein a difference between the depth 
values of the selected and new fragment tuples must 
fall within a range of difference values generated using 
the depth gradient vectors of the new and selected 
fragment tuples. 

61. The method of claim 48 wherein 

the new fragment tuple and the selected fragment tuple 
each include a color tuple; and 

the predefined merge criteria include a requirement that 
the color tuple of the new fragment tuple meet pre- 
defined color similarity criteria with respect to the color 
tuple of the selected fragment tuple. 

62. The method of claim 61, wherein each color tuple 
includes a plurality of elements, and the predefined color 
similarity criteria comprises a requirement that a sum of 
squares of differences between elements of the color tuple of 
the new fragment tuple and elements of the color tuple of the 
selected fragment tuple be less than a predefined maximum 
value. 

63. The method of claim 61, wherein each color tuple 
includes a plurality of elements, and the predefined color 
similarity criteria comprises a requirement that absolute 
values of the differences between elements of the color tuple 
of the new fragment tuple and elements of the color tuple of 
the selected fragment tuple each be less than a predefined 
maximum value. 

64. The method of claim 48, wherein 

the new fragment tuple and the selected fragment tuple 
each include a color tuple; and 

the predefined merge criteria include a requirement that 
absolute values of the differences between elements of 
the color tuple of the new fragment tuple and elements 
of the color tuple of the selected fragment tuple each be 
less than a predefined maximum color element differ- 
ence value, 

65. The method of claim 48, wherein 

the fragment tuple storing includes storing the fragment 
tuples in a plurality of blocks, each block having 
capacity to store more than one fragment tuple and 
storing a plurality of parameters applicable to all frag- 
ments tuples stored within the block. 
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66. The method of claim 48, wherein the comparing 
includes performing computations on the new and selected 
fragment tuples to determine whether the predefined merge 
criteria are met, and the merging includes receiving at least 
one value, other than said outcome, computed during the 
comparing and utilizing the at least one received value as an 
input to a computation for computing a characteristic of the 
merged fragment tuple. 

67. The method of claim 48, wherein 

the new and selected fragment tuples each include a depth 
gradient vector that includes a first component, indi- 
cating a rate of change in depth value in a first direction, 
and second component, indicating a rate of change in 
depth value in a second direction; and 

the merging conditionally generates a depth gradient 
vector for the merged fragment tuple by selecting 
whichever of the depth gradient vectors of the new and 
selected fragment tuples has a smaller length and using 
the selected depth gradient vector as the depth gradient 
vector of the merged fragment tuple. 

68. The method of claim 48, wherein 

one of the new and the selected fragment tuples has 
associated therewith a Z 1 depth value, an x 1 c centroid 
value and a y 1 ,. centroid value, a z 1 x gradient value and 
a z 1 y gradient value, and the other fragment tuple has 
associated therewith a Z 2 depth value and an x 2 c 
centroid value and a y 2 ^ centroid value, a z 2 ,. gradient 
value and a z 2 y gradient value; and 

said comparing includes determining that the depth values 
of the new and selected fragment tuples are similar 
when 
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said comparing includes determining that the depth values 
of the new and selected fragment tuples are similar 
when 
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69. The method of claim 48 wherein 

one of the new and the selected fragment tuples has 
associated therewith a Z 1 depth value, an x 1 ,. centroid 
value and a Y 1 ^ centroid value, a z\ gradient value and 
a z 1 y gradient value, and the other fragment tuple has 
associated therewith a Z 2 depth value and an x 2 c 
centroid value and a y 2 c centroid value, a z z x gradient 
value and a z 2 gradient value, and 



70. The method of claim 48 wherein 
one of the new and the selected fragment tuples has 

associated therewith a z x x gradient value and a z x y 
gradient value, and the other fragment tuple has asso- 
ciated therewith a z 2 x gradient value and a z 2 y gradient 
value, IKz 1 ^, z 1 )|| represents the length of the vector 
(z 1 ^ z l y ), and \\(z 2 x , z 2 y )\\ represents the length of the 
vector (z 2 xi z 2 y ), and 
said comparing including determining that the selected 
and the new fragment tuples face in similar directions 
when 

71. The method of claim 48 wherein 

one of the new and the selected fragment tuples has 
associated therewith a z x x gradient value and a z l y 
35 gradient value, and the other fragment tuple has asso- 
ciated therewith a z 2 ^ gradient value and a z 2 y gradient 
value, 

said comparing including determining that the selected 
and the new fragment tuples face a same direction 
when 



20 



25 



30 



40 



and 



45 



sign^J-signtz 2 .,), 



sign^-signfr 2 ,). 



07/15/2004, EAST version: 1.4.1 



